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Preface to the Second Edition 


There is a lot that is different about this second edition. First, there is a co-author, 
without whose help this revision would not have been possible. Second, we have 
benefited from countless letters from readers and colleagues who have pointed out 
errors and omissions and have made valuable suggestions over the past 25 years. 
These communications make this revision worth the effort. Third, we have tried to 
update the content of the book while striving to preserve the character and spirit of 
the first edition. 
Here are some of the numerous changes that have been made: 


1. 


2. 


The Introduction section has been removed. We have also removed Chapter 
14 on sequential statistical inference. 


Many parts of the book have undergone substantial rewriting. For example, 
Chapter 4 has many changes, such as inclusion of exchangeability. In Chapter 
3 an introduction to characteristic functions has been added, in Chapter 5 
some new distributions have been added, and in Chapter 6 there have been 
many changes in proofs. 


. The statistical inference part of the book (Chapters 8 to 13) has been updated. 


Thus in Chapter 8 we have expanded the coverage of invariance and have 
included discussions of ancillary statistics and conjugate prior distributions. 


. Similar changes have been made in Chapter 9. A new section on locally most 


powerful tests has been added. 


. Chapter 11 has been greatly revised and a discussion of invariant confidence 


intervals has been added. 


. Chapter 13 has been completely rewritten in the light of increased emphasis 


on nonparametric inference. We have expanded the discussion of U-statistics. 
Later sections show the connection between commonly used tests and U- 
statistics. 


. In Chapter 12, the notation has been changed to confirm to the current con- 


vention. 


. Many problems and examples have been added. 


xii PREFACE TO THE SECOND EDITION 


9. More figures have been added to illustrate €xamples and proofs. 
10. Answers to selected problems have been provided. 


We are truly grateful to the readers of the first edition for countless comments and 
suggestions and hope we will continue to hear from them about this edition. Please 
direct your comments to vrohatg @attglobal.net or to saleh @ math.carleton.ca. 

Special thanks are due Ms. Gillian Murray for her superb word processing of the 
manuscript, and Dr. Indar Bhatia for figures that appear in the text. Dr. Bhatia spent 
countless hours preparing the diagrams for publication. We also acknowledge the 
assistance of Dr. K. Selvavel. 


VIJAY K. ROHATGI 
A. K. Md. EHSANES SALEH 


Preface to the First Edition 


This book on probability theory and mathematical statistics is designed for a three- 
quarter course meeting four hours per week or a two-semester course meeting three 
hours per week. It is designed primarily for advanced seniors and beginning grad- 
uate students in mathematics, but it can also be used by students in physics and 
engineering with strong mathematical backgrounds. Let me emphasize that this is a 
mathematics text and not a “cookbook.” It should not be used as a text for service 
courses. 

The mathematics prerequisites for this book are modest. It is assumed that the 
reader has had basic courses in set theory and linear algebra and a solid course in 
advanced calculus. No prior knowledge of probability and/or statistics is assumed. 

My aim is to provide a solid and well-balanced introduction to probability theory 
and mathematical statistics. It is assumed that students who wish to do graduate 
work in probability theory and mathematical statistics will be taking, concurrently 
with this course, a measure-theoretic course in analysis if they have not already had 
one. These students can go on to take advanced-level courses in probability theory 
or mathematical statistics after completing this course. 

This book consists of essentially three parts, although no such formal divisions 
are designated in the text. The first part consists of Chapters 1 through 6, which 
form the core of the probability portion of the course. The second part, Chapters 7 
through 11, covers the foundations of statistical inference. The third part consists of 
the remaining three chapters on special topics. For course sequences that separate 
probability and mathematical statistics, the first part of the book can be used for a 
course in probability theory, followed by a course in mathematical statistics based 
on the second part and, possibly, one or more chapters on special topics. 

The reader will find here a wealth of material. Although the topics covered are 
fairly conventional, the discussions and special topics included are not. Many pre- 
sentations give far more depth than is usually the case in a book at this level. Some 
special features of the book are the following: 


1. A well-referenced chapter on the preliminaries. 


2. About 550 problems, over 350 worked-out examples, about 200 remarks, and 
about 150 references. 
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. An advance warning to readers wherever the details become too involved. 


They can skip the later portion of the section in question on first reading 
without destroying the continuity in any way. 


. Many results on characterizations of distributions (Chapter 5). 
. Proof of the central limit theorem by the method of operators and proof of 


the strong law of large numbers (Chapter 6). 


. A section on minimal sufficient statistics (Chapter 8). 


7. A chapter on special tests (Chapter 10). 


10. 
11. 


. A careful presentation of the theory of confidence intervals, including 


Bayesian intervals and shortest-length confidence intervals (Chapter 11). 


. A chapter on the general linear hypothesis, which carries linear models 


through to their use in basic analysis of variance (Chapter 12). 
Sections on nonparametric estimation and robustness (Chapter 13). 
Two sections on sequential estimation (Chapter 14). 


The contents of this book were used in a one-year (two-semester) course that I 
taught three times at the Catholic University of America and once in a three-quarter 
course at Bowling Green State University. In the fall of 1973 my colleague, Professor 
Eugene Lukacs, taught the first quarter of this same course on the basis of my notes, 
which eventually became this book. I have always been able to cover this book (with 
few omissions) in a one-year course, lecturing three hours a week. An hour-long 
problem session every week is conducted by a senior graduate student. 

In a book of this size there are bound to be some misprints, errors, and ambiguities 
of presentation. I shall be grateful to any reader who brings these to my attention. 


V. K. ROHATGI 


Bowling Green, Ohio 
February 1975 
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Enumeration of Theorems 
and References 


The book is divided into 13 chapters, numbered 1 through 13. Each chapter is divided 
into several sections. Lemmas, theorems, equations, definitions, remarks, figures, and 
so on, are numbered consecutively within each section. Thus Theorem i.j.k refers 
to the kth theorem in Section j of Chapter i, Section i.j refers to the jth section of 
Chapter i, and so on. Theorem j refers to the jth theorem of the section in which it 
appears. A similar convention is used for equations except that equation numbers are 
enclosed in parentheses. Each section is followed by a set of problems for which the 
same numbering system is used. 

References are given at the end of the book and are denoted in the text by numbers 
enclosed in square brackets, [ ]. If a citation is to a book, the notation ({i, p. j]}) 
refers to the jth page of the reference numbered [/]. 

A word about the proofs of results stated without proof in this book: If a reference 
appears immediately following or preceding the statement of a result, it generally 
means that the proof is beyond the scope of this text. If no reference is given, it 
indicates that the proof is left to the reader. Sometimes the reader is asked to supply 
the proof as a problem. 
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CHAPTER 1 


Probability 


1.1. INTRODUCTION 


The theory of probability had its origin in gambling and games of chance. It owes 
much to the curiosity of gamblers who pestered their friends in the mathematical 
world with all sorts of questions. Unfortunately, this association with gambling con- 
tributed to very slow and sporadic growth of probability theory as a mathematical 
discipline. The mathematicians of the day took little or no interest in the develop- 
ment of any theory but looked only at the combinatorial reasoning involved in each 
problem. 

The first attempt at some mathematical rigor is credited to Laplace. In his monu- 
mental work, Theorie analytique des probabilités (1812), Laplace gave the classical 
definition of the probability of an event that can occur only in a finite number of 
ways as the proportion of the number of favorable outcomes to the total number of 
all possible outcomes, provided that all the outcomes are equally likely. According 
to this definition, computation of the probability of events was reduced to combina- 
torial counting problems. Even in those days, this definition was found inadequate. 
In addition to being circular and restrictive, it did not answer the question of what 
probability is; it only gave a practical method of computing the probabilities of some 
simple events. 

An extension of the classical definition of Laplace was used to evaluate the prob- 
abilities of sets of events with infinite outcomes. The notion of equal likelihood of 
certain events played a key role in this development. According to this extension, 
if Q is some region with a well-defined measure (length, area, volume, etc.), the 
probability that a point chosen at random lies in a subregion A of Q is the ratio 
measure(A)/measure(Q). Many problems of geometric probability were solved us- 
ing this extension. The trouble is that one can define at random in any way one 
pleases, and different definitions lead to different answers. For example, Joseph 
Bertrand, in his book Calcul des probabilités (Paris, 1889), cited a number of prob- 
lems in geometric probability where the result depended on the method of solution. 
In Example 1.3.9 we discuss the famous Bertrand paradox and show that in reality 
there is nothing paradoxical about Bertrand’s paradoxes; once we define probability 


| 
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spaces carefully, the paradox is resolved. Nevertheless, difficulties encountered in 
the field of geometric probability have been largely responsible for the slow growth 
of probability theory and its tardy acceptance by mathematicians as a mathematical 
discipline. 

The mathematical theory of probability as we know it today is of comparatively 
recent origin. It was A. N. Kolmogorov who axiomatized probability in his funda- 
mental work, Foundations of the Theory of Probability (Berlin), in 1933. According 
to this development, random events are represented by sets and probability is just a 
normed measure defined on these sets. This measure-theoretic development not only 
provided a logically consistent foundation for probability theory but also joined it to 
the mainstream of modern mathematics. 

In this book we follow Kolmogorov’s axiomatic development. In Section 1.2 we 
introduce the notion of a sample space. In Section 1.3 we state Kolmogorov’s axioms 
of probability and study some simple consequences of these axioms. Section 1.4 is 
devoted to the computation of probability on finite sample spaces. Section 1.5 deals 
with conditional probability and Bayes rule, and Section 1.6 examines the indepen- 
dence of events. 


1.2 SAMPLE SPACE 


In most branches of knowledge, experiments are a way of life. In probability and 
statistics, too, we concern ourselves with special types of experiments. Consider the 
following examples. 


Example 1. A coin is tossed. Assuming that the coin does not land on the side, 
there are two possible outcomes of the experiment: heads and tails. On any perfor- 
mance of this experiment, one does not know what the outcome will be. The coin 
can be tossed as many times as desired. 


Example 2, A roulette wheel is a circular disk divided into 38 equal sectors num- 
bered from 0 to 36 and 00. A ball is rolled on the edge of the wheel, and the wheel is 
rolled in the opposite direction. One bets on any of the 38 numbers or some combi- 
nation of them. One can also bet on a color, red or black. If the ball lands in the sector 
numbered 32, say, anybody who bet on 32, or a combination including 32, wins;.and 
so on. In this experiment, all possible outcomes are known in advance, namely 00, 
0, 1, 2,... , 36, but on any performance of the experiment there is uncertainty as to 
what the outcome will be, provided, of course, that the wheel is not rigged in any 
manner. Clearly, the wheel can be rolled any number of times. 


Example 3. A manufacturer produces 12-in rulers. The experiment consists in 
measuring as accurately as possible the length of a ruler produced by the manufac- 
turer. Because of errors in the production process, one does not know what the true 
length of the ruler selected will be. It is clear, however, that the length will be, say, 
between 11 and 13 in., or, if one wants to be safe, between 6 and 18 in. 
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Example 4. The length of life of a light bulb produced by a certain manufacturer 
is recorded. In this case one does not know what the length of life will be for the 
light bulb selected, but clearly one is aware in advance that it will be some number 
between 0 and oo hours. 


The experiments described above have certain common features. For each exper- 
iment, we know in advance all possible outcomes; that is, there are no surprises in 
store after any performance of the experiment. On any performance of the exper- 
iment, however, we do not know what the specific outcome will be; that is, there 
is uncertainty about the outcome on any performance of the experiment. Moreover, 
the experiment can be repeated under identical conditions. These features describe a 
random (or statistical) experiment. 


Definition 1. A random (or statistical) experiment is an experiment in which: 


(a) All outcomes of the experiment are known in advance. 

(b) Any performance of the experiment results in an outcome that is not known 
in advance. 

(c) The experiment can be repeated under identical conditions. 


In probability theory we study this uncertainty of a random experiment. It is con- 
venient to associate with each such experiment a set (, the set of all possible out- 
comes of the experiment. To engage in any meaningful discussion about the exper- 
iment, we associate with Q a o-field S of subsets of Q. We recall that a o-field is 
a nonempty class of subsets of 2 that is closed under the formation of countable 
unions and complements and contains the null set @. 


Definition 2. The sample space of a statistical experiment is a pair (2, S), where 


(a) Q is the set of all possible outcomes of the experiment. 
(b) S is ao-field of subsets of Q. 


The elements of & are called sample points. Any set A € S is known as an 
event. Clearly, A is a collection of sample points. We say that an event A happens 
if the outcome of the experiment corresponds to a point in A. Each one-point set is 
known as a simple or elementary event. If the set 2 contains only a finite number of 
points, we say that (82, S) is a finite sample space. If Q contains at most a countable 
number of points, we call (Q, S) a discrete sample space. If, however, Q contains 
uncountably many points, we say that (922, S) is an uncountable sample space. In 
particular, if 2 = Rx or some rectangle in Rx, we call it a continuous sample space. 


Remark I. The choice of S is an important one, and some remarks are in order. 
If © contains at most a countable number of points, we can always take S to be the 
class of all subsets of 92. This is certainly a o-field. Each one-point set is a member 
of S and is the fundamental object of interest. Every subset of Q is an event. If Q 
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has uncountably many points, the class of all subsets of Q is still a o-field, but it is 
much too large a class of sets to be of interest. One of the most important examples 
of an uncountable sample space is the case in which 2 = R or Q is an interval in R. 
In this case we would like all one-point subsets of Q2 and all intervals (closed, open, 
or semiclosed) to be events. We use our knowledge of analysis to specify S. We will 
not go into detail here except to recall that the class of all semiclosed intervals (a, b] 
generates a class 8, that is a o-field on R. This class contains all one-point sets and 
all intervals (finite or infinite). We take S = 81. Since we will be dealing mostly 
with the one-dimensional case, we write B instead of 81. There are many subsets 
of R that are not in 81, but we do not demonstrate this fact here. We refer the reader 
to Halmos [39], Royden [94], or Kolmogorov and Fomin [52] for further details. 


Example 5. Let us toss a coin. The set Q is the set of symbols H and T, where H 
denotes head and T represents tail. Also, S is the class of all subsets of Q, namely, 
{{H}, {T}, (H, T}, 0}. If the coin is tossed two times, then 


Q = (CH, H), (H, T), (T, H), (T, T)}, 
and 


S = {9, (HW, H)}, (HH, T)}, (Cl, HK}, (C1, T)}, (CH, H), (, T)}, (CH, A), CT, HD}, 
{(H, H), (T, T)}, (CH, T), (T, KE}, (7, 1D), (1, Ky, (1, 1), 

(H, T)}, (CH, H), CH, T), (T, H)}, {(H, H), (1), (T, 1), 

{(H, H), (T, H), (T, T)}, {, T), (T, H), CT, T)}, Q}, 


where the first element of a pair denotes the outcome of the first toss, and the second 
element, the outcome of the second toss. The event at least one head consists of 
sample points (H, H), (H, T), (T, H). The event at most one head is the collection of 
sample points (H, T), (T, H), (T, T). 


Example 6. A die is rolled n times. The sample space is the pair (2, S), where 
Q is the set of all n-tuples (x1, x2,... Xn), xi € {1,2, 3, 4,5, 6},i = 1,2,...,20, 
and S is the class of all subsets of $2. Q contains 6” elementary events. The event A 
that 1 shows at least once is the set 
A= {(x1,X2,... , Xn): at least one of x;’s is 1} 
= Q — {(x1, x2,... , Xn): none of the x;’s is 1} 


= Q — {(x1, X2,... ,4n): 47 © (2,3, 4,5, 6}, 7 = 1,2,...,n}. 
Example 7. A coin is tossed until the first head appears. Then 


Q = {H, (T, H), (T, T, H), (T, T, T, HD, ...}, 
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and S is the class of all subsets of Q. An equivalent way of writing 22 would be to 
look at the number of tosses required for the first head. Clearly, this number can take 
values 1, 2,3, ... , so that Q is the set of all positive integers. Thus S is the class of 
all subsets of positive integers. 


Example 8. Consider a pointer that is free to spin about the center of a circle. 
If the pointer is spun by an impulse, it will finally come to rest at some point. On 
the assumption that the mechanism is not rigged in any manner, each point on the 
circumference is a possible outcome of the experiment. The set 2 consists of all 
points 0 < x < 2mr, where r is the radius of the circle. Every one-point set {x} is 
a simple event, namely, that the pointer will come to rest at x. The events of interest 
are those in which the pointer stops at a point belonging to a specified arc. Here S is 
taken to be the Borel o-field of subsets of [0, 27rr). 


Example 9. A rod of length / is thrown onto a flat table, which is ruled with 
parallel lines at distance 2/. The experiment consists in noting whether or not the rod 
intersects one of the ruled lines. 

Let r denote the distance from the center of the rod to the nearest ruled line, and 
let 9 be the angle that the axis of the rod makes with this line (Fig. 1). Every outcome 
of this experiment corresponds to a point (7, @) in the plane. As 22 we take the set of 
all points (r, 0) in {(7, 6): 0 <r <1,0 <0 < x}. For S we take the Borel o-field, 
$B, of subsets of 2, that is, the smallest o-field generated by rectangles of the form 


{a, y)ia<x<b,c<y<d,0<a<b<I1,0<c<d<zn}. 


Fig. 1. 
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Clearly, the rod will intersect a ruled line if and only if the center of the rod lies in 
the area enclosed by the locus of the center of the rod (while one end touches the 
nearest line) and the nearest line (shaded area in Fig. 2). 


Remark 2. From the discussion above it should be clear that in the discrete case 
there is really no problem. Every one-point set is also an event, and S is the class of 
all subsets of 2. The problem, if there is any, arises only in regard to uncountable 
sample spaces. The reader has to remember only that in this case not all subsets of 
2 are events. The case of most interest is the one in which Q = Ry. In this case 
roughly all sets that have a well-defined volume (or area or length) are events. Not 
every set has the property in question, but sets that lack it are not easy to find and 
one does not encounter them in practice. 


PROBLEMS 1.2 


1. A club has five members, A, B, C, D, and E. It is required to select a chairman 
and a secretary. Assuming that one member cannot occupy both positions, write 
the sample space associated with these selections. What is the event that member 
A is an officeholder? 


2. In each of the following experiments, what is the sample space? 


(a) In a survey of families with three children, the genders of the children are 
recorded in increasing order of age. 


(b) The experiment consists of selecting four items from a manufacturer’s output 
and observing whether or not each item is defective. 
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(c) A given book is opened to any page, and the number of misprints is counted. 


(d) Two cards are drawn from an ordinary deck of cards (i) with replacement, 
and (ii) without replacement. 


3. Let A, B, C be three arbitrary events on a sample space (Q, S). What is the event 
that only A occurs? What is the event that at least two of A, B, C occur? What is 
the event that both A and C, but not B, occur? What is the event that at most one 
of A, B, C occurs? 
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Let (2, S) be the sample space associated with a statistical experiment. In this sec- 
tion we define a probability set function and study somé of its properties. 


Definition 1. Let (Q,S) be a sample space. A set function P defined on S is 
called a probability measure (or simply, probability) if it satisfies the following con- 
ditions: 


(i) P(A) > OforallA eS. 
(ii) P(Q) = 1. 
(iii) Let {Aj}, Aj € S, 7 = 1,2,..., be a disjoint sequence of sets; that is, 
Aj M Ay = @ for j 4k, where @ is the null set. Then 


(1) o(Sa,) = >> P(Aj), 
j=l j=! 


where we have used the notation ey A; to denote union of disjoint sets 
A; 
je 


We call P(A) the probability of event A. If there is no confusion, we will write 
PA instead of P(A). Property (iii) is called countable additivity. That P@ = 0 and 
P is also finitely additive follows from it. 


Remark I. If Q is discrete and contains at most n (< 00) points, each single- 
point set {w;}, j = 1,2,...,m, is an elementary event, and it is sufficient to assign 
probability to each {w;}. Then if A € S, where S is the class of all subsets of &, 
PA = ¥o.<4 P{w}. One such assignment is the equally likely assignment or the 
assignment of uniform probabilities. According to this assignment, P{w;} = 1/n, 
j=1,2,...,n.Thus PA = m/n if A contains m elementary events, 1 < m <n. 


Remark 2. If Q is discrete and contains a countable number of points, one can- 
not make an equally likely assignment of probabilities. It suffices to make the assign- 
ment for each elementary event. If A € S, where S is the class of all subsets of Q, 
define PA = D.¢4 Plo}. 
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Remark 3. If & contains uncountably many points, each one-point set is an ele- 
mentary event, and again one cannot make an equally likely assignment of probabili- 
ties. Indeed, one cannot assign positive probability to each elementary event without 
violating the axiom PQ = 1. In this case one assigns probabilities to compound 
events consisting of intervals. For example, if $2 = [0, 1] and S is the Borel o-field 
of all subsets of Q, the assignment P[J] = length of 7, where J is a subinterval of 
2, defines a probability. 


Definition 2. The triple (2, S, P) is called a probability space. 


Definition 3. Let A € S. We say that the odds for A areatobif PA =a/(a+b), 
and then the odds against A are b toa. 


In many games of chance, probability is often stated in terms of odds against an 
event. Thus in horse racing a two-dollar bet on a horse to win with odds of 2 to 1 
(against) pays approximately six dollars if the horse wins the race. In this case the 
probability of winning is 3. 


Example I. Let us toss a coin. The sample space is (Q,S), where Q = {H, T)} 
and S is the o-field of all subsets of 92. Let us define P on S as follows: 


P{H}=5 and P{T}=3-. 


Then P clearly defines a probability. Similarly, P{H} = 3, P{T} = 4, and P{H} = 
1, P{T} = 0 are probabilities defined on S. Indeed, 


P{H}=p and P{T}=1-~p (Ox<p<)) 
defines a probability on (&, S). 


Example 2. Let Q = {1, 2,3, ...} be the set of positive integers, and let S be the 
class of all subsets of 2. Define P on S as follows: 


ee td : 
Pli) = 5 i=1,2,.... 


Then )-2, P{i} = 1, and P defines a probability. 


Example 3. Let Q = (0,00) and S = 8, the Borel o-field on Q. Define P as 
follows: For each interval J C Q, 


PI= pe dx. 
rf 


Clearly, PI > 0, PQ = 1, and P is countably additive by properties of integrals. 
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Theorem 1. P is monotone and subtractive; that is, if A, B € Sand A C B, then 
PA < PB and P(B — A) = PB — PA, where B— A = BNA‘, A® being the 
complement of the event A. 

Proof. If A C B, then 

B= (AN B)+(B- A)=A+(B- A), 
and it follows that PB = PA + P(B-— A). 

Corollary. For ali A ¢ S,0< PA <1. 

Remark 4. We wish to emphasize that if PA = 0 for some A € S, we call A an 
event with zero probability or a null event. However, it does not follow that A = @. 
Similarly, if PB = 1 for some B € S, we call B a certain event, but it does not 
follow that B = Q. 

Theorem 2 (Addition Rule). If A, B € S, then 
(2) P(AU B) = PA+ PB- P(ANB). 

Proof. Clearly, 

AUB =(A-— B)+(B—A)+(ANB) 
and 
A=(ANB)+(A— 8B), B=(AN B)+(B- A). 
The result follows by countable additivity of P. 
Corollary 1. P is subadditive, that is, if A, B € S, then 


(3) P(AUB) < PA+ PB. 


Corollary 1 can be extended to an arbitrary number of events A ;, 
(4) (Us) = PAs 
j J 
Corollary 2. If B = A‘, then A and B are disjoint and 


(5) PA=1—PA‘. 


The following generalization of (2) is left as an exercise. 
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Theorem 3 (Principle of Inclusion—Exclusion). Let Aj, A2,...,An € S. 
Then 


n 


(6) (U a) = y PAg— >> P(Ak, 1 Ary) 
k=1 k=1 


ky <ky 


n 
+ Do P(Ag A Ag 9 Abs) 
ky <ky<k3 


seco ((} a): 


k=1 


Example 4. A die is rolled twice. Let all the elementary events in Q = {(i, j): 
i,j = 1,2,...,6} be assigned the same probability. Let A be the event that the 
first throw shows a number < 2, and B be the event that the second throw shows at 
least 5. Then 

A={i,j): 1<i<2, 7 =1,2,..., 6}, 
B={G@,j):5< j <6, i=1,2,... , 6}, 
AN B= {(1,5), 1, 6), (2, 5), (2, 6}; 


and 


P(AU B) = PA+ PB— P(ANB) 


Example 5. A coin is tossed three times. Let us assign equal probability to each 
of the 23 elementary events in Q. Let A be the event that at least one head shows up 
in three throws. Then 


P(A) =1— P(A‘) 
= 1 — P(no heads) 
= 1— P(TTT) =}. 


We next derive two useful inequalities. 


Theorem 4 (Bonferroni’s Inequality). Given n (> 1) events Aj, A2,..., An, 


(7) Spas Fo reainay < P( Ai) <)°PAy. 
i=] i=1 i=l] 


i<j 
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Proof. In view of (4), it suffices to prove the left side of (7). The proof is by 
induction. The inequality on the left is true for n = 2 since 


PA, + PA2 — P(A, 1 A2) = P(A1 U A2). 
For n = 3, 
3 3 
(Ua) = PA; — D>) P(A; NAj) + P(A ADM A3), 
i=] i= i<j 


and the result holds. Assuming that (7) holds for 3 < m < n — 1, we show that it 
also holds for m + 1: 


(Ue) ne(04) a) 
= (Ca) +Paan="(4mnn(s4)) 


m+l1 m 
> oe PA; — 3 P(Aj N Aj) — (Cu a) tne) 


i<j i=} 


m+1 
> > PA: - > P(A; 9 Aj) — 3 P(A; 1 Am41) 
i=] 


i<j i=] 


m+) m+1 
= >> PA; — > P(A; Aj). 
i=! i<j 
Theorem 5 (Boole’s Inequality). For any two events A and B, 
(8) P(AN B) = 1— PA‘ — PB. 
Corollary 1. Let {A;}, j = 1,2,... , be a countable sequence of events; then 
(9) P(NAj) > 1 — D> P(AS). 


Proof. Take 


in (8). 
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Corollary 2 (Implication Rule). If A, B, C € S and A and B imply C, then 
(10) PCS < PAS + PBS. 


Let {A,} be a sequence of sets. The set of all points w € {2 that belong to A, 
for infinitely many values of n is known as the limit superior of the sequence and is 
denoted by 


limsupA, or lim Aj. 
n—->co n->00 


The set of all points that belong to A, for all but a finite number of values of n is 
known as the limit inferior of the sequence {A,,} and is denoted by 


lim infA, or lim Ag. 
n->CoO n—-oo 


If 


lim A, = im, An, 
n—-co 


we say that the limit exists and write lim,—.o9 An for the common set and call it the 
limit set. 
We have 


co 0 [oo e.<) 
im An =Uf asf = lim An. 


If the sequence {A,} is such that A, © A,+1, form = 1,2,..., it is called nonde- 
creasing; if Ay D Anyi,n = 1,2,..., itis called nonincreasing. If the sequence Ay, 
is nondecreasing, we write A, J; if A, is nonincreasing, we write A, 7. Clearly, if 
An ¥ or Ay ¥, the limit exists and we have 


fo 4] 
lim A, = (J An if An 
, n=l 
and 
foe] 
limAn=[)An if An f. 
n 
n=) 


Theorem 6. Let {A,,} be a nondecreasing sequence of events in S; that is, Ay, € 
S,n=1,2,..., and 


An > An-1, |, a le er 
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Then 
oO 
(11) lim PAn = P( tim, An) = (0 an) 
Proof. Let 
Le @] 
A=(JAj. 

j=l 

Then 


CO 
A= An +) (Ajyi — Aj). 


j=n 


By countable additivity we have 


oo 
PA = PA, +) P(Aj+i — Aj). 
jan 
and letting n — oo, we see that 


co 
PA= lim PA, + lim | P(Aj+1 — Aj). 
j=n 


The second term on the right tends to zero as n > oo since the sum re —y P(Aj4i-— 
A;) < 1 and each summand is nonnegative. The result follows. 


Corollary. Let {A,,} be a nonincreasing sequence of events in S. Then 
foe] 
(12) lim PAn = P( lim, An) = (A an) ' 
Proof. Consider the nondecreasing sequence of events {A‘}. Then 
foe] 
‘ (an c_eae 
wim, An = U 45 =4 
It follows from Theorem 6 that 


tnt r(imat)=o(C.a) <r 
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In other words, 
lim (i — PA,) = 1-—- PA, 
n>oo 
as asserted. 


Remark 5. Theorem 6 and its corollary will be used quite frequently in subse- 
quent chapters. Property (11) is called the continuity of P from below, and (12) is 
known as the continuity of P from above. Thus Theorem 6 and its corollary assure 
us that the set function P is continuous from above and below. 


We conclude this section with some remarks concerning the use of the word ran- 
dom in this book. In probability theory random has essentially three meanings. First, 
in sampling from a finite population, a sample is said to be a random sample if at 
each draw all members available for selection have the same probability of being 
included. We discuss sampling from a finite population in Section 1.4. Second, we 
speak of a random sample from a probability distribution. This notion is formal- 
ized in Section 7.2. The third meaning arises in the context of geometric probability, 
where statements such as “a point is chosen randomly from the interval (a, b)” and 
“a point is picked randomly from a unit square” are frequently encountered. Once we 
have studied random variables and their distributions, problems involving geometric 
probabilities may be formulated in terms of problems involving independent uni- 
formly distributed random variables, and these statements can be given appropriate 
interpretations. 

Roughly speaking, these statements involve a certain assignment of probability. 
The word random expresses our desire to assign equal probability to sets of equal 
lengths, areas, or volumes. Let Q C 7, be a given set, and A be a subset of Q. We 
are interested in the probability that a randomly chosen point in Q falls in A. Here 
randomly chosen means that the point may be any point of 22 and that the probability 
of its falling in some subset A of Q is proportional to the measure of A (independent 
of the location and shape of A). Assuming that both A and Q have well-defined finite 
measures (length, area, volume, etc.), we define 


= measure(A) 
~ measure(Q) 


[In the language of measure theory we are assuming that Q is a measurable subset of 
Rr that has a finite, positive Lebesque measure. If A is any measurable set, PA = 
(A) /u(Q), where yz is the n-dimensional Lebesque measure.] Thus, if a point is 
chosen at random from the interval (a, b), the probability that it lies in the interval 
(c,d),a <c <d <b,is (d—c)/(b—a). Moreover, the probability that the randomly 
selected point lies in any interval of length (d — c) is the same. 

We present some examples. 


Example 6. A point is picked “at random” from a unit square. Let Q = {(x, y): 
0 <x < 1,0 < y < I}. It is clear that all rectangles and their unions must be in 
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| 
| 
i 
| 


(0.1) Peeper , (1,1) 


| 1{1,0) 
(0,0) L i = 


Fig.1. A= {(x,y):0<x<},}<y< I). 

S; so, too, should be all circles in the unit square, since the area of a circle is also 
well defined. Indeed, every set that has a well-defined area has to be in S. We choose 
S = By, the Borel o-field generated by rectangles in 2. As for the probability 
assignment, if A € S, we assign PA to A, where PA is the area of the set A. If 
A= {(x,y):0<x< a4 < y < I}, then PA = I. If B is a circle with center 
(4, 4) and radius 4, then PB = 2(5)* = 77/4. If C is the set of all points that are at 
most a unit distance from the origin, then PC = 7/4 (see Figs. 1 to 3). 


(0,1) 


B 


(0,0) Sa fs 


Fig. 2. B= (x,y): @~ $2 + - 2 = 1. 
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oo 


(0,0) (1,0) x 


Fig. 3. C = {(x, y): x2 + y? < 1). 


Example 7 (Buffon’s Needle Problem). We return to Example 1.2.9. A needle 
(rod) of length / is tossed at random on a plane that is ruled with a series of parallel 
lines a distance 2/ apart. We wish to find the probability that the needle will intersect 
one of the lines. Denoting by r the distance from the center of the needle to the closest 
line and by @ the angle that the needle forms with this line, we see that a necessary 
and sufficient condition for the needle to intersect the line is that r < (7/2) sin. The 
needle will intersect the nearest line if and only if its center falls in the shaded region 
in Fig. 1.2.2. We assign probability to an event A as follows: 


area of set A 


PA= 
In 
Thus the required probability is 
1 71 1 
—_— ~sin6 dé = —. 
Ix Jo 2 nu 


Here we have interpreted at random to mean that the position of the needle is char- 
acterized by a point (r, 9) which lies in the rectangle 0 < r <1,0 < 6 < w. We 
have assumed that the probability that the point (7, 6) lies in any arbitrary subset of 
this rectangle is proportional to the area of this set. Roughly, this means that “all po- 
sitions of the midpoint of the needle are assigned the same weight and all directions 
of the needle are assigned the same weight.” 


Example 8. An interval of length 1, say (0, 1), is divided into three intervals by 
choosing two points at random. What is the probability that the three line segments 
form a triangle? 

It is clear that a necessary and sufficient condition for the three segments to form 
a triangle is that the length of any one of the segments be less than the sum of the 
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other two. Let x, y be the abscissas of the two points chosen at random. Then we 
must have either 


O<x<}<y<l and y-x<4 
or 


<x<1 = and x-y<5. 


NI 


O<y< 


This is precisely the shaded area in Fig. 4. It follows that the required probability 
sl 
1S 4 
If it is specified in advance that the point x is chosen at random from (0, 4), and 
the point y at random from G, 1), we must have 


O<x <}, 3<y<l, 


and 
y-x<x+l—-—y or 2y-x) <1. 
In this case the area bounded by these lines is the shaded area in Fig. 5, and it follows 


that the required probability is }. 
Note the difference in sample spaces in the two computations made above. 


(1,0) x 


(0,0) — 


Fig. 4. (x,y): 0 < x < i < y < l, and (y —x) < sor0 < y <3 <x < 1, and 


(x — y) < $). 
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we 


xX 


Fig. 5. {(x, y):0<x <4, $< y < land 2(y—x) < 1}. 


Example 9 (Bertrand’s Paradox). A chord is drawn at random in the unit cir- 
cle. What is the probability that the chord is tonger than the side of the equilateral 
triangle inscribed in the circle? 

We present here three solutions to this problem, depending on how we interpret 
the phrase at random. The paradox is resolved once we define the probability spaces 
carefully. 


SOLUTION 1. Since the length of a chord is uniquely determined by the position 
of its midpoint, choose a point C at random in the circle and draw a line through C 
and O, the center of the circle (Fig. 6). Draw the chord through C perpendicular to 
the line OC. If 1; is the length of the chord with C as midpoint, /; > /3 if and only 
if C lies inside the circle with center O and radius 5. Thus PA = m(5)? /x= i. 

In this case Q is the circle with center O and radius 1, and the event A is the 
concentric circle with center O and radius }. S is the usual Borel o-field of subsets 
of 2. 


SOLUTION 2. Because of symmetry, we may fix one endpoint of the chord at 
some point P and then choose the other endpoint P| at random. Let the probability 
that P; lies on an arbitrary arc of the circle be proportional to the length of this arc. 
Now the inscribed equilateral triangle having P as one of its vertices divides the 
circumference into three equal parts. A chord drawn through P will be longer than 
the side of the triangle if and only if the other endpoint P; (Fig. 7) of the chord lies 
on that one-third of the circumference that is opposite P. It follows that the required 
probability is 4. In this case Q = [0, 277], S = B, NQ, and A = (27/3, 42/3}. 
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Fig. 6. 


Fig. 7. 


SOLUTION 3. Note that the length of a chord is determined uniquely by the 
distance of its midpoint from the center of the circle. Due to the symmetry of the 
circle, we assume that the midpoint of the chord lies on a fixed radius, OM, of the 
circle (Fig. 8). The probability that the midpoint M lies in a given segment of the 
radius through M is then proportional to the length of this segment. Clearly, the 
length of the chord will be longer than the side of the inscribed equilateral triangle if 
the length of OM is less than radius/2. \t follows that the required probability is 5. 
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Fig. 8. 


PROBLEMS 1.3 


1. Let Q be the set of all nonnegative integers and S the class of all subsets of Q. 
In each of the following cases, does P define a probability on (Q, S)? 


(a) For A € S, let 


—hyx 
ex 
PA=) ce A> 0. 
xEA 


(b) For A € S, let 


PA=)> p(i— py, O0<p<i. 


xéA 
(c) For A € S, let PA = 1 if A has a finite number of elements, and PA = 0 
otherwise. 


2. Let Q = Rand S = B. In each of the following cases, does P define a proba- 
bility on (Q, S)? 


(a) For each interval /, let 
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(b) For each interval J, let PJ = 1 if J is an interval of finite length, and PI = 0 
if J is an infinite interval. 

(c) For each interval 7, let PI = O if J C (—oo,1) and PI = f,G)ax if 
IC [1, oo). (If 7 = Ky + by, where 1) © (—o, 1) and Fy € [1, 00), then 
PIl= Plo.) 


3, Let A and B be two events such that B D> A. What is P(A U B)? What is 
P(AN B)? What is P(A — B)? 


In Problem 1(a) and (b), let A = {all integers > 2}, B = {all nonnegative 
integers < 3}, and C = {all integers x, 3 < x < 6}. Find PA, PB, PC, 
P(AN B), P(AUB), P(BUC), P(ANC), and P(BNC). 


5. In Problem 2(a), let A be the event A = {x: x > 0}. Find PA. Also find 
P{x: x > O}. 


6. A box contains.1000 light bulbs. The probability that there is at least 1 defective 
bulb in the box is 0.1, and the probability that there are at least 2 defective bulbs 
is 0.05. Find the probability in each of the following cases: 


4 


(a) The box contains no defective bulbs. 
(b) The box contains exactly 1 defective bulb. 
(c) The box contains at most 1 defective bulb. 


7. Two points are chosen at random on a line of unit length. Find the probability 


that each of the three line segments so formed will have a length > i. 


8. Find the probability that the sum of two randomly chosen positive numbers (both 
< 1) will not exceed 1 and that their product will be < z. 


9. Prove Theorem 3. 


10. Let {A,} be a sequence of events such that A, —> A asn — oo. Show that 
PA, > PAasn— oo. 


11. The base and altitude of a right triangle are obtained by picking points randomly 
from [0, a] and [0, b], respectively. Show that the probability that the area of the 
triangle so formed will be less than ab/4 is (1 + In2)/2. 


12. A point X is chosen at random on a line segment AB. (a) Show that the proba- 
bility that the ratio of lengths AX/BX is smaller than a (a > 0) is a/(1 +). 
(b) Show that the probability that the ratio of the length of the shorter segment 


to that of the larger segment is less than ; iS 5. 


? 


1.4 COMBINATORICS: PROBABILITY ON FINITE SAMPLE SPACES 


In this section we restrict attention to sample spaces that have at most a finite number 
of points. Let 2 = {@1, @2,...,@n} and S be the a-field of all subsets of 8&2. For 
any Ae 5S, 
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PA= >> Plaj}. 


wjEeA 


Definition 1. An assignment of probability is said to be equally likely (or uni- 
form) if each elementary event in Q is assigned the same probability. Thus, if Q 
contains n points w;, P{@;} = 1/n, j = 1,2,... ,n. 


With this assignment 


number of elementary events in A 
~~ total number of elementary events in Q” 


(1) 
Example 1. A coin is tossed twice. The sample space consists of four points. Un- 
der the uniform assignment, each of four elementary events is assigned probability ie 


Example 2. Three dice are rolled. The sample space consists of 6° points. Each 
one-point set is assigned probability 1/67. 


In games of chance we usually deal with finite sample spaces where uniform prob- 
ability is assigned to all simple events. The same is the case in sampling schemes. In 
such instances the computation of the probability of an event A reduces to a combi- 
natorial counting problem. We therefore consider some rules of counting. 


Rule 1. Given a collection of n; elements a41, @12,... , @in,, 22 elements a2, 
422, ... , @2n2, and so on, up to ny, elements Agi, a42,... , Akn,, it is possible to form 
Ny Nges:-- n,; ordered k-tuples (@} j,, 42},,--+ » kj, ) containing one element of each 


kind, 1 < jj <nj,i=1,2,...,k. 


Example 3. Here r distinguishable balls are to be placed in n cells. This amounts 
to choosing one cell for each ball. The sample space consists of n’ r-tuples 
(i1,i2,...,%r), where ij is the cell number of the jth ball, j = 1,2,...,r 
(1 s<ij <n). 

Consider r tossings with a coin. There are 2” possible outcomes. The probability 
that no heads will show up in r throws is Gy . Similarly, the probability that no 6 


will turn up in r throws of a die is (3y" . 


Rule 2 is concerned with ordered samples. Consider a set of n elements aj, a2, 
... 5dy,. Any ordered arrangement (aj, , @j,,... , ai,) of r of these n symbols is called 
an ordered sample of size r. If elements are selected one by one, there are two pos- 
sibilities: 


1. Sampling with replacement. In this case repetitions are permitted, and we can 
draw samples of an arbitrary size. Clearly, there are n’ samples of size r. 
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2. Sampling without replacement. In this case an element once chosen is not 
replaced, so that there can be no repetitions. Clearly, the sample size cannot 
exceed n, the size of the population. There are n(n — 1)---(n—r+1) =yP,, 
say, possible samples of size r. Clearly, , P, = 0 for integers r > n. If r =n, 
then , P, = n!. 


Rule 2. If ordered samples of size r are drawn from a population of n elements, 
there are n’ different samples with replacement and , P, samples without replace- 
ment. 


Corollary. The number of permutations of n objects is 7!. 


Remark 1. We frequently use the term random sample in this book to describe 
the equal assignment of probability to all possible samples in sampling from a finite 
population. Thus, when we speak of a random sample of size r from a population of 
n elements, it means that in sampling with replacement, each of n’ samples has the 
same probability 1/n" or that in sampling without replacement, each of , P, samples 
is assigned probability 1/, P,. 


Example 4. Consider a set of n elements. A sample of size r is drawn at random 
with replacement. Then the probability that no element appears more than once is 
clearly , P,/n’. 

Thus, if 2 balls are to be randomly placed in n cells, the probability that each cell 
will be occupied is n!/n”. 


Example 5. Consider a class of r students. The birthdays of these r students form 
a sample of size r from the 365 days in the year. Then the probability that all r 
birthdays are different is 365 P, /(365)’. One can show that this probability is < 5 if 
r= 23. 

The following table gives the values of g, = 365P,/(365)’ for some selected 
values of r. 


r 20 23 25 30 35 60 
qr | 0.589 0.493 0.431 0.294 0.186 0.006 


Next suppose that each of the r students is asked for his or her birth date in order, 
with the instruction that as soon as a student hears his or her birth date the student 
is to raise a hand. Let us compute the probability that a hand is first raised when the 
kth (k = 1,2, ... , 7) student is asked his or her birth date. Let p; be the probability 
that the procedure terminates at the kth student. Then 


4 364 r-l 
ons 365 
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_ 365Pi-1 (, aoa ; 365 —k y" boas 
PE (365)! 365 265 eI » &=2,3,....,4 


Example 6. Let Q be the set of all permutations of objects. Let A; be the set of 
all permutations that leave the ith object unchanged. Then the set U?_, A; is the set 
of permutations with at least one fixed point. Clearly, 


— 1)! 
pee oe be 1,2). 0.505 
n! 
(n — 2)! 
ni” 


P(A;N Aj) = 


i<j; i,j=1,2,...,n, etc. 


By Theorem 1.3.3 we have 


As an application, consider an absentminded secretary who places n letters in n 
envelopes at random. Then the probability that he or she will misplace every letter is 


It is easy to see that this last probability —> e~! = 0.3679 as n > oo. 


Rule 3. There are (") different subpopulations of size r < n from a population 
r 


of n elements, where 


n n! 
@) (=a 


Example 7. Consider the random distribution of r balls in n cells. Let Ax be 
the event that a specified cell has exactly k balls, k = 0,1,2,...,r; k balls can 


be chosen in (;) ways. We place k balls in the specified cell and distribute the 


remaining r — k balls in the n — 1 cells in (n — 1)’~* ways. Thus 


_ (r\a-\r* _ ry (1 ye 
Pe) ra Ae) \) Oa) 
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Example 8. There are 9) = 635,013,559,600 different hands at bridge and 


52 
= 2,598,960 hands at poker. 


The probability that all 13 cards in a bridge hand have different face values is 


3,(92 
4 (es) 


The probability that a hand at poker contains five different face values is 
13 45/ 52 
5 sy 


Rule 4. Consider a population of n elements. The number of ways in which the 


population can be partitioned into k subpopulations of sizes rj, 7r2,... , r%, Tespec- 
tively, ry +ro+--- +r, =n,0 <1; <n, is given by 
n n! 
(3) aR TRS REGENT 
T1,72,.-- 5% rytrol---rp! 


The numbers defined in (3) are known as multinomial coefficients. 


Proof. For the proof of Rule 4, one uses Rule 3 repeatedly. Note that 


© Cron OCC} 
T1,172,..- 4k ry r2 YR-1 


Example 9. In a game of bridge the probability that a hand of 13 cards contains 
2 spades, 7 hearts, 3 diamonds, and 1 club is 


13\ /13\ (13) /13 
2 7 3 1 
(3) 
13 
Example 10. An urn contains 5 red, 3 green, 2 blue, and 4 white balls. A sample 


of size 8 is selected at random without replacement. The probability that the sample 
contains 2 red, 2 green, 1 blue, and 3 white balls is 


(OG) 
(s) 


1. How many different words can be formed by permuting letters of the word Mis- 
sissippi? How many of these start with the letters Mi? 
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. An urn contains R red and W white marbles. Marbles are drawn from the um 


one after another without replacement. Let Ax be the event that a red marble is 
drawn for the first time on the kth draw. Show that 


R yee R 
PA, = ———__—_—— 1 — —————___ } , 
« mee ewici) 


Let p be the proportion of red marbles in the urn before the first draw. Show that 
PA, — p(1— p)*-! as R + W = oo. Is this to be expected? 


In a population of N elements, R are red and W = N — R are white. A group of 
n elements is selected at random. Find the probability that the group so chosen 
will contain exactly r red elements. 


. Each permutation of the digits 1, 2, 3, 4, 5, 6 determines a six-digit number. If 


the numbers corresponding to all possible permutations are listed in increasing 
order of magnitude, find the 319th number on this list. 


. The numbers 1, 2,... , are arranged in random order. Find the probability that 


the digits 1,2,... ,k (k <n) appear as neighbors in that order. 


A pinball table has seven holes through which a ball can drop. Five balls are 
played. Assuming that at each play a ball is equally likely to go down any one of 
the seven holes, find the probability that more than one ball goes down at least 
one of the holes. 


- If 2m boys are divided into two equal subgroups, find the probability that the two 


tallest boys will be (a) in different subgroups, and (b) in the same subgroup. 


. Ina movie theater that can accommodate n +k people, n people are seated. What 


is the probability that r < n given seats are occupied? 


Waiting in line for a Saturday morning movie show are 2n children. Tickets are 
pticed at a quarter each. Find the probability that nobody will have to wait for 
change if before a ticket is sold to the first customer, the cashier has 2k (k < n) 
quarters. Assume that it is equally likely that each ticket is paid for with a quarter 
or a half-dollar coin. 


Each box of a certain brand of breakfast cereal contains a small charm, with k 
distinct charms forming a set. Assuming that the chance of drawing any particu- 
lar charm is equal to that of drawing any other charm, show that the probability 
of finding at least one complete set of charms in a random purchase of N > k 
boxes equals 


OF) OG) -OY 


een é Gy [{Hint: Use (1.3.7).] 
peak aes : 3.7). 
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11. 
12. 


13. 


14. 


15. 


16. 


17. 


Prove Rules 1 through 4. 


In a five-card poker game, find the probability that a hand will have: 

(a) A royal flush (ace, king, queen, jack, and 10 of the same suit). 

(b) A straight flush (five cards in a sequence, all of the same suit; ace is high but 
A, 2, 3, 4, 5 is also a sequence), excluding a royal flush. 

(c) Four of a kind (four cards of the same face value). 


(d) A full house (three cards of the same face value x and two cards of the same 
face value y). 


(e) A flush (five cards of the same suit, excluding cards in a sequence). 
(f) A straight (five cards in a sequence). 


(g) Three of a kind (three cards of the same face value and two cards of different 
face values). 


(h) Two pairs. 
(i) A single pair. 


(a) A married couple and four of their friends enter a row of seats in a concert 
hall. What is the probability that the wife will sit next to her husband if all 
possible seating arrangements are equally likely? 


(b) In part (a), suppose that the six people go to a restaurant after the concert 
and sit at a round table. What is the probability that the wife will sit next to 
her husband? 


Consider a town with N people. A person sends two letters to two separate 
people, each of whom is asked to repeat the procedure. Thus for each letter re- 
ceived, two letters are sent out to separate persons chosen at random (irrespective 
of what happened in the past). What is the probability that in the first n stages 
the person who started the chain letter game will not receive a letter? 


Consider a town with N people. A person tells a rumor to a second person, who 
in turn repeats it to a third person, and so on. Suppose that at each stage the 
recipient of the rumor is chosen at random from the remaining N — 1 people. 
What is the probability that the rumor will be repeated n times: 


(a) Without being repeated to any person? 
(b) Without being repeated to the originator? 


There were four accidents in a town during a seven—day period. Would you be 
surprised if all four occurred on the same day? If each of the four occurred on a 
different day? 


Whereas Rules 1 and 2 of counting deal with ordered samples with or with- 

out replacement, Rule 3 concerns unordered sampling without replacement. The 

most difficult rule of counting deals with unordered with replacement sampling. 
+r—\ 

Show that there are (" : ) possible unordered samples of size r from a 


population of n elements when sampled with replacement. 
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1.5 CONDITIONAL PROBABILITY AND BAYES THEOREM 


So far, we have computed probabilities of events on the assumption that no infor- 
mation was available about the experiment other than the sample space. Sometimes, 
however, it is known that an event H has happened. How do we use this informa- 
tion in making a statement concerning the outcome of another event A? Consider the 
following examples. 


Example 1, Let urn 1 contain one white and two black balls, and urn 2, one black 
and two white balls. A fair coin is tossed. If a head turns up, a ball is drawn at random 
from um 1; otherwise, from urn 2. Let E be the event that the ball drawn is black. 
The sample space is 2 = {Hby;, Hbj2, Hw 1, Tb21, Tw21, Tw22}, where H denotes 
head, T denotes tail, b;; denotes jth black ball in ith urn, i = 1, 2, and so on. Then 


If, however, it is known that the coin showed a head, the ball could not have been 
drawn from urn 2. Thus, the probability of E, conditional on information H, is z. 


Note that this probability equals the ratio P {head and ball drawn black}/ P {head}. 


Example 2. Let us toss two fair coins. Then the sample space of the experiment 
is 2 = {HH, HT, TH, TT}. Let event A = {both coins show same face} and B = {at 
least one coin shows H}. Then PA = Z. If B is known to have happened, this 
information assures that TT cannot happen, and P{A conditional on the information 
that B has happened} = 4 = 1/3 = P(AN B)/PB. 


Definition 1. Let (Q,S, P) be a probability space, and let H € S with PH > 0. 
For an arbitrary A € S we shall write 


P(AN A) 


(1) PIA| H}=— 


and call the quantity so defined the conditional probability of A, given H. Condi- 
tional probability remains undefined when PH = 0. 


Theorem 1. Let (2,5, P) be a probability space, and let H € S with PH > 0. 
Then (Q, S, Py), where Py(A) = P{A | H} for all A € S, is a probability space. 


Proof. Clearly, Py(A) = P{A | H} > 0 for all A € S. Also, Py(Q) = 
P(QN H)/PH =1.If Aj, Ao, ... is a disjoint sequence of sets in S, then 


= Ee Ps A) ye Ores 
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Remark I. What we have done is to consider a new sample space consisting of 
the basic set H and the o-field Sy = SNH, of subsets AN H, A € S, of H. On 
this space we have defined a set function Py by multiplying the probability of each 
event by (PH)~!. Indeed, (H, Sy, Py) is a probability space. 

Let A and B be two events with PA > 0, PB > 0. Then it follows from (1) that 
(2) P(AN B)=PA-P{B|A}, and P(ANB)=PB.- P{A| B}. 


Equations (2) may be generalized to any number of events. Let Aj, A2,..., An € S, 
n > 2, and assume that PGa Aj) > 0. Since 


n--2 n~| 
Al D (Ait NA2) D (Ay N A2N AZ) D-+-D (A) a) (Aw). 
j=l j=l 


we see that 


n—2 
PA,>0, P(AiNA2)>0, ..., o((\4;) 0. 
jal 


It follows that P{ Ay, | apa Aj} are well-defined for k = 2,3,... ,n. 
Theorem 2 (Multiplication Rule). Let (82,5, P) be a probability space and 


Ay, A2,... An € S, with P(M} Aj) > 0. Then 


(3) (A 41) = P(A1)P{A2 | A1}P{A3 | Ar Ag} ++ P {-, 


j=1 


n—| 
() Aj : 
j=l 
Proof. The proof is simple. 
Let us suppose that {H/;} is a countable collection of events in S such that Hj N 
Hy, = @, j #k, and >a H; = Q. Suppose that PH; > 0 for all j. Then 
ie.2) 
(4) PB=)P(Hj)P{B| Hj} forall Be S. 
j=l 
For the proof we note that 
co 
B=) (BN Hj), 
j=l 


and the result follows. Equation (4) is called the total probability rule. 
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Example 3. Consider a hand of five cards in a game of poker. If the cards are 
52 
dealt at random, there are ( 5 ) possible hands of five cards each. Let A = {at least 


3 cards of spades}, B = {all 5 cards of spades}. Then 


P(AQ B) = P{all 5 cards of spades} 


and 


13 52 
5 5 
19) (92) .4 (19 \ (9?) 4 (18 52 
3 2 4 1 5 5 
Example 4. Urn | contains one white and two black marbles, urn 2 contains one 
black and two white marbles, and urn 3 contains three black and three white marbles. 
A die is rolled. If a 1, 2, or 3 shows up, urn 1 is selected; if a 4 shows up, urn 2 is 
selected; and if a 5 or 6 shows up, urn 3 is selected. A marble is then drawn at random 


from the urn selected. Let A be the event that the marble drawn is white. If U, V, W, 
respectively, denote the events that the urn selected is 1, 2, 3, then 


A =(ANU)+(ANV)+(ANW), 
P(ANU) = P(WU)- P{A| U}=2-} 


6° 3? 
P(ANV) = P(V)- P{A| V} =@- 4%, 


P(ANW) = P(W)- P{A | W} = 2-2. 
It follows that 
A | 1 io 4 
PA=¢+5+5=35: 


A simple consequence of the total probability rule is the Bayes rule, which we 
now prove. 


Theorem 3 (Bayes Rule). Let {H,,} be a disjoint sequence of events such that 
PH, > 0,n=1,2,..., and (°°, H, = Q. Let B € S with PB > 0. Then 
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P(H;)P{B | Hj} 


(5) P{H; | B) = ae eee 


D> PCH) P{B | Hi} 


i=l 
Proof. From (2) 
P{BO H;} = P(B)P{H; | B} = PH; P{B | Aj}, 


and it follows that 
PH;P{B | Hj} 


P{H; | B}= PB 


The result now follows on using (4). 


Remark 2. Suppose that H,, H2,... are all the “causes” that lead to the out- 
come of a random experiment. Let H; be the set of outcomes corresponding to the 
jth cause. Assume that the probabilities PH;, j = 1,2,... , called the prior prob- 
abilities, can be assigned. Now suppose that the experiment results in an event B of 
positive probability. This information leads to a reassessment of the prior probabili- 
ties. The conditional probabilities P{H; | B} are called the posterior probabilities. 
Formula (5) can be.interpreted as a rule giving the probability that observed event B 
was due to cause or hypothesis H;. 


Example 5. In Example 4, let us compute the conditional probability P{V | A}. 
We have 


Pir ane PVP{A|V} 
PUP{A|U}+ PVP{A| V}+ PWP{A | W} 
1.2 1 
z: 6 3 a 9h ed 
Soret o a ae 


PROBLEMS 1.5 


1. Let A and B be two events such that PA = p; > 0, PB = p2 > O, and 
Pi + p2 > 1. Show that P{B | A} > 1—[C — p2)/pil. 


2. Two digits are chosen at random without replacement from the set of integers 
{1, 2, 3, 4, 5, 6, 7, 8}. 
(a) Find the probability that both digits are greater than 5. 
(b) Show that the probability that the sum of the'digits will be equal to 5 is the 
same as the probability that their sum will exceed 13. 


3. The probability of a family chosen at random having exactly k children is ap*, 
0 < p < 1. Suppose that the probability that any child has blue eyes is b, 
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0 < b < 1, independently of others. What is the probability that a family chosen 
at random has exactly 7 (r > 0) children with blue eyes? 


. In Problem 3, let us write 


Pk = probability of a randomly chosen family having exactly k children 


= ap*, a [ee Ae 


l1—p 


Suppose that all gender distributions of k children are equally likely. Find the 
probability that a family has exactly r boys, r > 1. Find the conditional proba- 
bility that a family has at least two boys, given that it has at least one boy. 


. Each of (N + 1) identical urns marked 0, 1,2,...,N contains N balls. The 


kth urn contains k black and N — k white balls, k = 0,1,2,...,N. An um 
is chosen at random, and n random drawings are made from it, the ball drawn 
always being replaced. If all the n draws result in black balls, find the probability 
that the (n + 1)th draw will also produce a black ball. How does this probability 
behave as N — 00? 


Each of n urns contains four white and six black balls, while another urn contains 
five white and five black balls. An urn is chosen at random from the (n + 1) urns, 
and two balls are drawn from it, both being black. The probability that five white 
and three black balls remain in the chosen urn is 4. Find n. 


. In answering a question on a multiple-choice test, a candidate either knows the 


answer with probability p (0 < p < 1) or does not know the answer with 
probability 1 — p. If he knows the answer, he puts down the correct answer with 
probability 0.99, whereas if he guesses, the probability of his putting down the 
correct result is 1/k (kK choices to the answer). Find the conditional probability 
that the candidate knew the answer to a question, given that he has made the 
correct answer. Show that this probability tends to 1 as k — oo. 


An urn contains five white and four black balls. Four balls are transferred to a 
second urn. A ball is then drawn from this urn, and it happens to be black. Find 
the probability of drawing a white ball from among the remaining three. 


. Prove Theorem 2. 


An um contains r red and g green marbles. A marble is drawn at random and its 
color noted. Then the marble drawn, together with c > 0 marbles of the same 
color, are returned to the urn. Suppose that n such draws are made from the urn. 
Find the probability of selecting a red marble at any draw. 


Consider a bicyclist who leaves a point P (see Fig. 1), choosing one of the roads 
PR, PR2, PR3 at random. At each subsequent crossroad she again chooses a 
road at random. 
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Fig. 1. Map for Problem 11. 


(a) What is the probability that she will arrive at point A? 
(b) What is the conditional probability that she will arrive at A via road P R3? 


Five percent of patients suffering from a certain disease are selected to undergo a 
new treatment that is believed to increase the recovery rate from 30 percent to 50 
percent. A person is randomly selected from these patients after the completion 
of the treatment and is found to have recovered. What is the probability that the 
patient received the new treatment? 


Four roads lead away from the county jail. A prisoner has escaped from the jail 
and selects a road at random. If road I is selected, the probability of escaping is 
4 if road II is selected, the probability of success is h; if road Il is selected, the 
probability of escaping is i and if road IV is selected, the probability of success 
is ae 

(a) What is the probability that the prisoner will succeed in escaping? 

(b) If the prisoner succeeds, what is the probability that the prisoner escaped by 

using road IV? By using road I? 


A diagnostic test for a certain disease is 95 percent accurate, in that if a person 
has the disease, it will detect it with a probability of 0.95, and if a person does not 
have the disease, it will give a negative result with a probability of 0.95. Suppose 
that only 0.5 percent of the population has the disease in question. A person is 
chosen at random from this population. The test indicates that this person has 
the disease. What is the (conditional) probability that he or she does have the 
disease? 
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Let (Q,S, P) be a probability space, and let A, B € S, with PB > O. By the 
multiplication rule we have 


P(ANM B) = P(B)P{A | B}. 
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In many experiments the information provided by B does not affect the probability 
of event A; that is, P{A | B} = P{A}. 


Example 1. Let two fair coins be tossed, and let A = {head on the second throw}, 
B = {head on the first throw}. Then 


P(A) = P{HH,TH}=5, — P(B) = {HH, HT} = 3, 


and 


P(ANB) _ 


=} = P(A). 


Ni | aD 


Thus 
P(ANM B) = P(A)P(B). 
In the following, we write AN B = AB. 


Definition 1. Two events, A and B, are said to be independent if and only if 
(1) P(AB) = P(A)P(B). 


Note that we have not placed any restriction on P(A) or P(B). Thus conditional 
probability is not defined when P(A) or P(B) = 0, but independence is. Clearly, 
if P(A) = 0, then A is independent of every E € S. Also, any event A € S is 
independent of @ and 2. 


Theorem 1. If A and B are independent events, then 
P{A | B} = P(A) if P(B) > 0 
and 
P{B| A} = P(B) if P(A) > 0. 


Theorem 2. If A and B are independent, so are A and B°, A‘ and B, and A‘ 
and B®, 


Proof. 
P(A‘ B) = P(B—(ANB)) 
= P(B)— P(ANB) since B D (ANB) 
= P(B)[1— P(A)] 
= P(A‘) P(B). 


Similarly, one proves that A° and B°, and A and B°, are independent. 
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We wish to emphasize that independence of events is not to be confused with 
disjoint or mutually exclusive events. If two events, each with nonzero probability, 
are mutually exclusive, they are obviously dependent since the occurrence of one 
will automatically preclude the occurrence of the other. Similarly, if A and B are 
independent and PA > 0, PB > 0, then A and B cannot be mutually exclusive. 


Example 2. A card is chosen at random from a deck of 52 cards. Let A be the 
event that the card is an ace, and B, the event that it is a club. Then 


and 
P(AB) = P{ace of clubs} = * 


so that A and B are independent. 

Example 3. Consider families with two children, and assume that all four 
possible distributions of gender: BB, BG, GB, GG, where B stands for boy and 
G for girl, are equally likely. Let E be the event that a randomly chosen family has 
at most one girl, and F,, the event that the family has children of both genders. Then 

P(E)=}, P(F)=}, and P(EF)= 3, 
so that E and F are not independent. 

Now consider families with three children. Assuming that each of the eight pos- 
sible gender distributions is equally likely, we have 

P(E)=%, P(F)=§8, and P(EF)=3, 
so that E and F are independent. 

An obvious extension of the concept of independence between two events A and 
B to a given collection 4 of events is to require that any two distinct events in L be 


independent. 


Definition 2. Let U be a family of events from S. We say that the events { are 
pairwise independent if and only if for every pair of distinct events A, B € LU, 


P(AB) = PA PB. 
A much stronger and more useful concept is mutual or complete independence. 
Definition 3. A family of events { is said to be a mutually or completely inde- 


pendent family if and only if for every finite subcollection {A;,, Aj, ... , Aj, } of £4 
the following relation holds: 


36 PROBABILITY 


k 
(2) P(Ai, NA N-+- Ag) = [] PAG: 
j=l 


In what follows we omit the adjective mutual or complete and speak of indepen- 
dent events. It is clear from Definition 3 that to check the independence of n events 


Aj, A2,..., An € S, we must check the following 2” ~ n — | relations: 
P(Aj;A;) = PA;PAj, iAj;i,j=1,2,...,n, 
P(AjAj Ax) = PA; PA; P Ax, ix; Ak i,j,k =1,2,...,n, 


P(A A2-+-An) = PA) PA2---PApn. 


The first of these requirements is pairwise independence. Independence therefore 
implies pairwise independence, but not conversely. 


Example 4 (Wong [119]). Take four identical marbles. On the first, write sym- 
bols A; A7A3. On each of the other three, write A;, Az, A3, respectively. Put the four 
marbles in an urn and draw one at random. Let £; denote the event that the symbol 
Aj; appears on the drawn marble. Then 


P(E\) = P(E2) = P(E3) = 3, 
P(E, E) = P(E2E3) = P(E E3) = 4, 
and 
(3) P(E, E2E3) = 5. 


It follows that although events E,, E2, E3, are not independent, they are pairwise 
independent. 


Example 5 (Kac [46], pp. 22-23). In this example P(E, F2E3) = P(E) x 
P(E) P(E3), but E1, Ez, E3 are not pairwise independent and hence not indepen- 
dent. Let Q = {1, 2,3, 4}, and let p; be the probability assigned to {i}, i = 1, 2, 3, 4. 
Let py = V2/2—4, po = 4, ps = 5 -V2/2, pa = 4. Let Ey = (1, 3}, E2 = (2, 3}, 
E3 = (3, 4}. Then 


3 2 1 2 2 
P(E, EE3) = P{3} = a7 . = (: — +) (: - +) 


= (pi + p3)(p2 + p3)(p3 + pa) 
= P(E) P(E2)P(E3). 


But P(E; F2) = 3 — J/2/2 # PEPE, and it follows that E,, E2, E3 are not 
independent. 
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Example 6. A die is roiled repeatedly until a 6 turns up. We will show that event 
A, that “a 6 will eventually show up,” is certain to occur. Let A; be the event that a 6 
will show up for the first time on the kth throw. Let A = "7°, Ax. Then 


pa=t(2) bP; 
and 
panty (3) =1 ae 
Ora 6 61-3 


Alternatively, we can use the corollary to Theorem 1.3.6. Let B, be the event that 
a 6 does not show up on the first n trials. Clearly, B,i; C Bn, and we have AS = 
Ne, Bn. Thus 


fo.) 5 n 
1-PA=PA°= (A | = lim P(B,) = lim (2) =0. 
n->00 noo \6 


n=1 


Example 7. A slip of paper is given to person A, who marks it with either a plus 
or minus sign; the probability of her writing a plus sign is i. A passes the slip to 
B, who may either leave it alone or change the sign before passing it to C. Next, C 
passes the slip to D after perhaps changing the sign; finally, D passes it to-a referee 
after perhaps changing the sign. The referee sees a plus sign on the slip. It is known 
that B, C, and D each change the sign with probability z. We shall compute the 
probability that A originally wrote a plus. 

Let N be the event that A wrote a plus sign, and M, the event that she wrote a 
minus sign. Let E be the event that the referee saw a plus sign on the slip. We have 


P(N)P{E | N} 


PIN | El = SapPiE| My + PPLE TN)” 


Now 


P{E | N} = P{the plus sign was either not changed or changed exactly twice} 
th). zy #(c 
~A3 3 3 


P{E | M} = P{the minus sign was changed either once or three times} 


QQ+G) 


and 
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It follows that 
4G) + 3GPQ)) 
PIN | E}= 5 13 22/1 2\7a/2\/1\2 1. (23 
(s)1GG)? + 369)" + BEQ)GY + GY] 
13 
le 2 
~ 41 ~ qq" 
aq 41 
PROBLEMS 1.6 
1. A biased coin is tossed until a head appears for the first time. Let p be the 


probability of a head, 0 < p < 1. What is the probability that the number of 
tosses required is odd? Even? 


. Let A and B be two independent events defined on some probability space, and 


let PA = 3, PB = 3. Find (a) P(AUB), (b)P{A | AUB}, and (c)P{B | AUB}. 


. Let Ai, Az, A3 be three independent events. Show that Aj, A>, and A§ are 


independent. 


. A biased coin with probability p, 0 < p < 1, of success (heads) is tossed until 


for the first time, the same result occurs three times in succession (that is, three 
heads or three tails in succession). Find the probability that the game will end at 
the seventh throw. 


. A box contains 20 black and 30 green balls. One ball at a time is drawn at ran- 


dom, its color is noted, and the ball is then replaced in the box for the next draw. 

(a) Find the probability that the first green ball! is drawn on the fourth draw. 

(b) Find the probability that the third and fourth green balls are drawn on the 
sixth and ninth draws, respectively. 

(c) Let N be the trial at which the fifth green ball is drawn. Find the probability 
that the fifth green ball is drawn on the nth draw. (Note that N take values 
5,6,7,....) 


_An urn contains four red and four black balls. A sample of two balls is drawn 


at random. If both balls drawn are of the same color, these balls are set aside 
and a new sample is drawn. If the two balls drawn are of different colors, they 
are returned to the urn and another sample is drawn. Assume that the draws are 
independent and that the same sampling plan is pursued at each stage until all 
balls are drawn. 


(a) Find the probability that at least n samples are drawn before two balls of the 
same color appear. 

(b) Find the probability that after the first two samples are drawn, four balls are 
left, two black and two red. 


. Let A, B, and C be three boxes with three, four, and five cells, respectively. 


There are three yellow balls numbered | to 3, four green balls numbered 1 to 4, 
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10. 


11. 


and five red balls numbered 1 to 5. The yellow balls are placed at random in box 
A, the green in B, and the red in C, with no cell receiving more than one ball. 
Find the probability that only one of the boxes will show no matches. 


. A pond contains red and golden fish. There are 3000 red and 7000 golden fish, 


of which 200 and 500, respectively, are tagged. Find the probability that a ran- 
dom sample of 100 red and 200 golden fish will show 15 and 20 tagged fish, 
respectively. 


. Let (2, S, P) be a probability space. Let A, B, C € S with PB and PC > 0. 


12. 


13 


If B and C are independent, show that 
P{A| B}) = P{A| BNC)PC+ P{A| BNC PC. 


Conversely, if this relation holds, P{A | BC} #4 P{A | B}, and PA > 0, then 
B and C are independent. (Strait [110]) 


Show that the converse of Theorem 2 also holds. Thus A and B are independent 
if, and only if, A and B° are independent; and so on. 


A lot of five identical batteries is life tested. The probability assignment is 
assumed to be 


1 
P(A) = f —e*/4 dy 
ek 


for any event A C [0, co), where 2 > 0 is a known constant. Thus the probabil- 
ity that a battery fails after time ¢ is given by 


00 4 
Pa,co) = f xe ax, t>0. 
t 


If the times to failure of the batteries are independent, what is the probability 
that at least one battery will be operating after fo hours? 


On Q = (a,b), —co < a < b < ov, each subinterval is assigned a proba- 
bility proportional to the length of the interval. Find a necessary and sufficient 
condition for two events to be independent. 


A game of craps is played with a pair of fair dice as follows. A player rolls the 
dice. If a sum of 7 or 11 shows up, the player wins; if a sum of 2, 3, or 12 shows 
up, the player loses. Otherwise, the player continues to roll the pair of dice until 
the sum is either 7 or the first number rolled. In the former case the player loses, 
and in the latter the player wins. 


(a) Find the probability that the player wins on the nth roll. 
(b) Find the probability that the player wins the game. 


(c) What is the probability that the game ends on (i) the first roll, (ii) the second 
roll, and (iti) the third roll? 


CHAPTER 2 


Random Variables and Their 
Probability Distributions 


2.1 INTRODUCTION 


In Chapter 1 we dealt essentially with random experiments that can be described by 
finite sample spaces. We studied the assignment and computation of probabilities of 
events. In practice, one observes a function defined on the space of outcomes. Thus, 
if a coin is tossed n times, one is not interested in knowing which of the 2” n-tuples 
in the sample space has occurred. Rather, one would like to know the number of 
heads in n tosses. In games of chance, one is interested in the net gain or loss of a 
certain player. Actually, in Chapter 1 we were concerned with such functions without 
defining the term random variable. Here we study the notion of a random variable 
and examine some of its properties. 

In Section 2.2 we define a random variable, and in Section 2.3 we study the notion 
of probability distribution of a random variable. Section 2.4 deals with some special 
types of random variables, and in Section 2.5 we consider functions of a random 
variable and their induced distributions. The fundamental difference between a ran- 
dom variable and a real-valued function of a real variable is the associated notion 
of a probability distribution. Nevertheless, our knowledge of advanced calculus or 
real analysis is the basic tool in the study of random variables and their probability 
distributions. 


2.2 RANDOM VARIABLES 


In Chapter 1 we studied properties of a set function P defined on a sample space 
(Q,S). Since P is a set function, it is not very easy to handle; we cannot perform 
arithmetic or algebraic operations on sets. Moreover, in practice one frequently ob- 
serves some function of elementary events. When a coin is tossed repeatedly, which 
replication resulted in heads is not of much interest. Rather, one is interested in the 
number of heads, and consequently, the number of tails, that appear in, say,  tossings 
of the coin. It is therefore desirable to introduce a point function on the sample space. 
We can then use our knowledge of calculus or real analysis to study properties of P. 
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Definition 1. Let (Q, S) be a sample space. A finite, single-valued function that 
maps Q into R is called a random variable (RV) if the inverse images under X of all 
Borel sets in R are events, that is, if 


(1) X1(B) ={o: X()EBsES forall BES. 


To verify whether a real-valued function on (Q, S) is an RV, it is not necessary to 
check that (1) holds for all Borel sets B € B. It suffices to verify (1) for any class A 
of subsets of R that generates B. By taking 2 to be the class of semiclosed intervals 
(—oo, x], x € R, we get the following result. 


Theorem 1. X is an RV if and only if for each x € R, 
(2) {w: X(@) <x} ={[X <x} eS. 


Remark I. Note that the notion of probability does not enter into the definition 
of an RV. 


Remark 2. If X is an RV, the sets {X = x}, {a < X < b}, {X <x}, {a<X < 
b}, {a < X < b}, {a < X < b} are all events. Moreover, we could have used any 
of these intervals to define an RV. For example, we could have used the following 
equivalent definition: X is an RV if and only if 


(3) {@: X(w) <x} ES for all x € R. 
We have 

ee 1 
(4) w= U(xsx-5) 
and 
(5) i sx=(|(x<x+7). 


Remark 3. In practice, (1) or (2) is a technical condition in the definition of an 
RV which the reader may ignore and think of RVs simply as real-valued functions 
defined on Q2. It should be emphasized, though, that there do exist subsets of R that 
do not belong to %, and hence there exist real-valued functions defined on Q that are 
not RVs, but the reader will not encounter them in practical applications. 


Example 1. For any set A C Q, define 


0, w¢A, 


1A@) = * wed. 


I4(@) is called the indicator function of set A. I, is an RV if and only if A €'S. 
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Example 2. Let 2 = {H, T}, and S be the class of all subsets of 2. Define X by 
X(H) = 1, X(T) = 0. Then 


4) ifx <0, 
X7!(-00, x] = 4 {T} if0<x <1, 
{H,T} ifl<x, 


and we see that X is an RV. 


Example 3. Let 2 = {HH, TT, HT, TH} and S be the class of all subsets of Q. 
Define X by 


X (mw) = number of H’s in w. 


Then X (HH) = 2, X(HT) = X(TH) = 1, and X(TT) = 0. 


9, x <0, 
TT 

x7!(-00, x] 2s { }, 0 =x< \, 
{TT, HT, TH}, 1<x <2, 
Q, 2<x. 


Thus X is an RV. 
Remark 4. Let (2, S) be a discrete sample space; that is, let 82 be a countable 
set of points and S be the class of all subsets of &. Then every numerical-valued 


function defined on (Q, S) is an RV. 


Example 4. Let Q = (0, 1] and S = BN (0, 1] be the o-field of Borel sets on 
[0, 1]. Define X on Q by 


X(w) =o, w € [0, 1]. 
Clearly, X is an RV. Any Borel subset of 92 is an event. 
Remark 5. Let X be an RV defined on (Q, S) and a, b be constants. Then aX +b 


is also an RV on (2, S). Moreover, X? is an RV and so also is 1/X, provided that 
{X = 0} = @. For a general result, see Theorem 2.5.1. 


PROBLEMS 2.2 
1. Let X be the number of heads in three tosses of a coin. What is 8&2? What are the 


values that X assigns to points of 2? What are the events {X < 2.75}, {0.5 < 
X < 1.72}? 
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2. A die is tossed two times. Let X be the sum of face values on the two tosses and 
Y be the absolute value of the difference in face values. What is 2? What values 
do X and Y assign to points of 2? Check to see whether X and Y are random 
variables. 


3. Let X be an RV. Is |X| also an RV? If X is an RV that takes only nonnegative 
values, is /X also an RV? 


A die is rolled five times. Let X be the sum of face values. Write the events 
{X = 4}, {X = 6}, {X = 30}, and {X > 29}. 
5. Let 2 = [0, 1] and S be the Borel o-field of subsets of 2. Define X on Q as 
follows: X(w) = wif 0 < w < 4 ri al 5 if 4 <w<1.Is X anRV? 
If so, what is the event {w: X(w) € ( i 5)}? 


6. Let 2 be a class of subsets of R that generates 8. Show that X is an RV on 2 if 
and only if X~!(A) € R forall A € A. 


> 


2.3 PROBABILITY DISTRIBUTION OF A RANDOM VARIABLE 


In Section 2.2 we introduced the concept of an RV and noted that the concept of 
probability on the sample space was not used in this definition. In practice, however, 
random variables are of interest only when they are defined on a probability space. 
Let (Q, 8, P) be a probability space, and let X be an RV defined on it. 


Theorem 1. The RV X defined on the probability space (Q, S, P) induces a 
probability space (R, B, Q) by means of the correspondence 


(1) Q(B) = P{X7|(B)} = P{w: X(w) € B} ~— forall Be B. 
We write Q = PX~! and call Q or PX~' the (probability) distribution of X. 
Proof. Clearly, Q(B) > 0 for all B € ®, and also Q(R) = P{X € R} = 


P(Q) = 1. Let By € B,i = 1,2,..., with B| 1 B; = @,i A j. Since the inverse 
image of a disjoint union of Borel sets is the disjoint union of their inverse images, 


(Ze)-rb (Ea) 
= {Sx 


PX (Bi) = OB). 


i=] 


> 


1 


i 


It follows that (R, B, Q) is a probability space, and the proof is complete. 


44 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS 


We note that Q is a set function and that set functions are not easy to handle. It is 
therefore more practical to use (2.2.2) since then Q(—oo, x] is a point function. Let 
us first introduce and study some properties of a special point function on R. 


Definition 1. A real-valued function F defined on (—oo, 00) that is nondecreas- 
ing, right continuous, and satisfies 


F(-oo)=0 and F(+oo)=1 
is called a distribution function (DF). 


Remark I. Recall that if F is a nondecreasing function on R, then F(x—) = 
lim,+, F(t), F(x+) = lim,,, F(t) exist and are finite. Also, F(+-00) and F(—oo) 
exist as lim;+400 F(t) and lim;)—o0 F(t), respectively. In general, 


F(x—) < F(x) < F(x), 


and x is a jump point of F if and only if F(x+) and F(x—) exist but are unequal. 
Thus a nondecreasing function F has only jump discontinuities. If we define 


F*(x) = F(x+) for all x, 


we see that F* is nondecreasing and right continuous on R. Thus in Definition 1 
the nondecreasing part is very important. Some authors demand left instead of right 
continuity in the definition of a DF. 


Theorem 2. The set of discontinuity points of a DF F is at most countable. 
Proof. Let (a, b] be a finite interval with at least n discontinuity points: 
a<x)<x2<-+-<4x, <b. 
Then 
F(a) < F(1—) < F(x) <--- < FQ@n—) < FG@n) < FO). 


Let py = F (xx) — F(xx—), k = 1,2,... , n. Clearly, 


> pe < Fb) — Fea), 


k=} 


and it follows that the number of points x in (a, b] with jump p(x) > é€ > 0 is 
at most e~!{F(b) — F(a)}. Thus for every integer N, the number of discontinuity 
points with jump greater than 1/N is finite. It follows that there are no more than a 
countable number of discontinuity points in every finite interval (a, b]. Since R is a 
countable union of such intervals, the proof is complete. 
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Definition 2. Let X be an RV defined on (2, S, P). Define a point function F(-) 
on FR by using (1), namely, 


(2) F(x) = Q(—o, x] = P{w: X(w) < x} for allx € R. 
The function F is called the distribution function of RV X. 

If there is no confusion, we will write 

F(x) = P{X < x}. 

The following result justifies our calling F as defined by (2) a DF. 

Theorem 3. The function F defined in (2) is indeed a DF. 

Proof. Let x, < x2. Then (—00, xy] C (—oo, x2], and we have 

F(x1) = P{X <x} < P{X < x2} = F(x). 

Since F is‘nondecreasing, it is sufficient to show that for any sequence of numbers 


Xn | X,Xy > XQ > ++ > Xp > +++ > X, Fy,) > F(a). Let Ag = {w: X(@) € 
(x, XJ}. Then Ay € S and Ax ¥. Also, 


oO 
li Ak = At = 
Fre a Em 
since none of the intervals (x, x,] contains x. It follows that limyz_,.9. P(Ax) = 0. 


But 


P(Ax) = P{X < xx} — P{X < x} 
= F(xx) — F(x), 


so that 


lim F(x,) = F(x), 
k->00 


and F is right continuous. 
Finally, let {x,} be a sequence of numbers decreasing to —oo. Then 


{X < xn} D {XK < xn41} for each n 


and 


lo @) 
tim {X < Xn} =e < Xn} =. 
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Therefore, 


F(—00) = lim, P(X < xn} = P| lim (X < an}} =0. 


Similarly, 


F(+00) = lim) P{X < xa} = 1, 


and the proof is complete. 


The next result, stated without proof, establishes a correspondence between the 
induced probability Q on (R, B) and a point function F defined on R. 


Theorem 4. Given a probability Q on (R, B), there exists a distribution function 
F satisfying 


(3) QO(—00, x] = F(x) for all x € R, 


and conversely, given a DF F, there exists a unique probability Q defined on (R, B) 
that satisfies (3). 


For proof, see Chung [14, pp. 23-24]. 
Theorem 5. Every DF is the DF of an RV on some probability space. 


Proof. Let F be a DF. From Theorem 4 it follows that there exists a unique 
probability Q defined on FR that satisfies 


Q(—oo, x] = F(x) for all x € R. 
Let (R, B, Q) be the probability space on which we define 


X(@) =o, weR. 


O{w: X(w) < x} = O(—00, x] = F(X), 
and F is the DF of RV X. 


Remark 2. If X isan RV on (Q, S, P), we have seen (Theorem 3) that F(x) = 
P{X < x} is a DF associated with X. Theorem 5 assures us that to every DF F 
we can associate some RV. Thus, given an RV, there exists a DF, and conversely. In 
this book when we speak of an RV we will assume that it is defined on a probability 
space. 
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Example 1. Let X be defined on (2, S, P) by 
X(w)=c for all w € Q. 
Then 
P{X =c}=1, 
F(x) = Q(—o0, x] = P{X~!(—00, x]} = 0 
and 
F(x) =1 ifx > c. 
Example 2. Let Q = {H, T} and X be defined by 
X(H) = 1, X(T) = 0. 


If P assigns equal mass to {H} and {T}, then 


and 
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ifx<c 


F(x) = O(-00, x] = }. O<x<l, 
1, 1l<x. 


Example 3. Let Q = {G, Jj): i,j € {1,2,3,4,5, 6}} and S be the set of all 
subsets of 2. Let P{(i, j)} = 1/6? for all 6 pairs (i, j) in Q. Define 


XG, YP =it j, 1<i,j <6. 
Then 

0, x <2, 
1 2<x <3, 
x, 3<x <4, 

F(x) = Q(-o, x] = P{X <x} = x 4<x<5, 
2, fl <x < 12, 
1, 12 <x. 


Example 4. We return to Example 2.2.4. For every subinterval J of [0, 1], let 
P (i) be the length of the interval. Then (Q, S, P) is a probability space, and the DF 
of RV X(w) = a, w € Q, is given by F(x) = Oif x < 0, F(x) = P{w: X(@) < 


x} = P({0O,x]) =x ifx € [0, 1],and F@) = lifx > 1. 
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PROBLEMS 2.3 


1. Write the DF of RV X defined in Problem 2.2.1, assuming that the coin is fair. 


2. What is the DF of RV Y defined in Problem 2.2.2, assuming that the die is not 
loaded? 
3. Do the following functions define DFs? 
(a) F(x) =Oifx <0,=xif0 <x < },and=1ifx > 3. 
(b) F(x) = (1/7) tan“! x, -00 < x < 00. 
(c) F(x) = Oifx < l,and=1— (1/x)ifl <x. 
(d) F(x) =1—e” ifx >0,and=Oifx <0. 
4, Let X be an RV with DF F. 
(a) If F is the DF defined in Problem 3(a), find P{X > 4}, P{Z < X < 3}. 
(b) If F is the DF defined in Problem 3(d), find P{—oco < X < 2}. 


2.4 DISCRETE AND CONTINUOUS RANDOM VARIABLES 


Let X be an RV defined on some fixed but otherwise arbitrary probability space 
(Q,S, P), and let F be the DF of X. In this book we restrict ourselves mainly to two 
cases: the case in which the RV assumes at most a countable number of values. and 
hence its DF is a step function, and that in which the DF F is (absolutely) continuous. 


Definition 1. An RV X defined on (Q, S, P) is said to be of the discrete type, or 
simply discrete, if there exists a countable set E C R such that P{X € E} = 1. The 
points of E that have positive mass are called jump points or points of increase of 
the DF of X, and their probabilities are called jumps of the DF. 


Note that E € B since every one-point set is in B. Indeed, if x € R, then 
ea 1 1 
(1) @=(] (s-i<xsx+5) ; 
ais n n 


Thus {X € E} is an event. Let X take on the value x; with probability p; (i = 
1,2,...). We have 


P{a: X(@) = xi} = pi, i=1,2,..., pi > Ofor alli. 
Then >? pi = 1. 


Definition 2. The collection of numbers {p;} satisfying P{X = x;} = p; > 0, 
for all i and 5°72, p; = 1, is called the probability mass function (PME) of RV X. 
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The DF F of X is given by 
(2) F(x) = P{X <x} = D> pi. 


xXjSx 
If /,4 denotes the indicator function of the set A, we may write 
[o.e} 
(3) X(@) = Do rilpx=x)@). 
i=] 
Let us define a function e(x) as follows: 


1, x>0, 
€ = 
() f x <0 


Then we have 
00 
(4) F(x) = )) pie(x — xi). 
i=] 
Example 1. The simplest example is that of an RV X degenerate at c, P{X = 
cj = 1: 


Fa) =r =| ee 


1, x>C. 


Example 2. A box contains good and defective items. If an item drawn is good, 
we assign the number 1 to the drawing; otherwise, the number 0. Let p be the prob- 
ability of drawing at random a good item. Then 


ates is 
1 P, 


and 
0, x <0, 
F(x) = P{X <x}= {1-p, O<x <1, 
1, 1<x. 


Example 3. Let X be an RV with PMF 
6 1 
P(X =kh= a2 Re? 


Then 


nas | 
FQ)= gi tae). 
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Theorem 1. Let {p;,} be a collection of nonnegative real numbers such that 
ypc Pe = 1. Then {px} is the PMF of some RV X. 


We next consider RVs associated with DFs that have no jump points. The DF of 
such an RV is continuous. We shall restrict our attention to a special subclass of such 
RVs. 


Definition 3. Let X be an RV defined on (Q, S, P) with DF F. Then X is said to 
be of the continuous type (or simply, continuous) if F is absolutely continuous, that 


is, if there exists a nonnegative function f(x) such that for every real number x we 
have 


(5) Fay=f feat. 


The function f is called the probability density function (PDF) of the RV X. 


Note that f > 0 and satisfies lim, .40. F(x) = F(+o0) = | ee f@dt =1. 
Let a and b be any two real numbers with a < b. Then 


Pla < X <b} = F(b) — F(a) 


b 
= i f@adt. 


In view of remarks following Definition 2.2.1, the following result holds. 


Theorem 2. Let X be an RV of the continuous type with PDF f. Then for every 
Borel set B € B, 


©) pia) = f fenar. 
B 
If F is absolutely continuous and f is continuous at x, we have 
dF (x) 
(7) F'(x) = 7 Pa f (x). 
x 


Theorem 3. Every nonnegative real function f that is integrable over R and sat- 
isfies 


iz f(x)dx =1 


is the PDF of some continuous RV X. 
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Proof. In view of Theorem 2.3.5, it suffices to show that there corresponds a DF 
F to f. Define 


Fay= | f(t) dt, xeER. 


Then F(—oo) = 0, F(+00) = I, and if x2 > x1, 


Fey (f- +f ) fwoar> f  ¢()dt = F(x). 
—0o xy —-ooO 


Finally, F is (absolutely) continuous and hence continuous from the right. 


Remark I. In the discrete case, P{X = a} is the probability that X takes the 
value a. In the continuous case, f(a) is not the probability that X takes the value a. 
Indeed, if X is of the continuous type, it assumes every value with probability 0. 


Theorem 4. Let X be any RV. Then 


(8) P(X =a} = lim P{t < X <a}. 


t<a 
Proof. Lett, <t2 <---< 4a, t, — a,and write 
An = {th < X <a}. 


Then A,, is a nonincreasing sequence of events that converges to (\P0., An = {X = 
a}. It follows that lim,—.9. PAn = P{X =a}. 


Remark 2. Since P{t < X < a} = F(a) — F(t), it follows that 


lim P{t < X <a} = P{X =a} = F(a) — lim F(t) 
ta ta 
t<a t<a 


= F(a) — F(a—). 


Thus F has a jump discontinuity at a if and only if P{X = a} > 0; that is, F is 
continuous at a if and only if P{X = a} = 0. If X is an RV of the continuous type, 
P{X =a} =0 for alla € R. Moreover, 


P{X €R-—{a}} =1. 


This justifies Remark 1.3.4. 


Remark 3. The set of real numbers x for which a DF F increases is called the 
support of F. Let X be the RV with DF F, and let S be the support of F. Then 
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P(X € S) = 1and P(X € S°) = 0. The set of positive integers is the support of the 
DF in Example 3, and the open interval (0, 1) is the support of F in Example 4. 


Example 4. Let X be an RV with DF F given by (Fig. 1) 


0, x <0, 
F(x) = ¢x, O<x <i, 
1, 1<x. 


Differentiating F with respect to x at continuity points of f, we get 


fey=Fo= | x <QOorx>1, 


1, O<x <1. 


The function f is not continuous at x = 0 or at x = 1 (Fig. 2). We may define f (0) 
and f(1) in any manner. Choosing f (0) = f(1) = 0, we have 


1, O<x <1], 


ro={ 


0, otherwise. 


Then 


P(0.4 < X < 0.6} = F(0.6) — F(0.4) = 0.2. 


F(x) 


Fig. 1. 
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ay ee ee 


Fig. 2. 


Example 5, Let X have the triangular PDF (Fig. 3) 


x, O<x<1, 
f(@~) = ,2-x, 1l<x <2, 
0, otherwise. 
1 
f(x) 
) ) 1 


Fig. 3. Graph of f. 
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F(x) 


F(x) 


x4 


0 1 2 
Fig. 4. Graph of F. 


It is easy to check that f is a PDF. For the DF F of X we have (Fig. 4) 


0 ifx <0, 

x x2 
fraa> fO<«x <1, 
F(x) = 4° 


1 x x2 
[ras [e-nar=2- 5-1 if1 <x <2, 
0 1 2 
1 ifx > 2. 


Then 


P{0.3 < X < 1.5} = P{X < 1.5} — P{X <0.3} 
= 0.83. 


Example 6. Let k > 0 be a constant, and 


kx(i — x), 0<x <1, 
0, otherwise. 


fQ@)= | 
Then le f(x) dx = k/6. It follows that f(x) defines a PDF if k = 6. We have 


3 
P{X >0.3}=1 ~6 f x(1 — x) dx = 0.784. 
0 
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We conclude this discussion by emphasizing that the two types of RVs considered 
above form only a part of the class of all RVs. These two classes, however, contain 
practically all the random variables that arise in practice. We note without proof (see 
Chung [14, p. 9]) that every DF F can be decomposed into two parts according to 


(9) F(x) = aFa(x) + (1 — a) Fo(x). 
Here Fy and F;. are both DFs; Fy is the DF of a discrete RV, while F, is a continuous 
(not necessarily absolutely continuous) DF. In fact, F; can be further decomposed, 


but we will not go into that (see Chung [14, p. 11). 


Example 7. Let X be an RV with DF 


2S 
* 
A 
2 


S 
Il 
S 


FQ@)= 


Nie Nie 
+ 
Ne 
i) 
A 
& 
A 


a 
—_ 
1A 
ca) 


Note that the DF F has a jump at x = 0 and F is continuous (in fact, absolutely 
continuous) in the interval (0, 1). F is the DF of an RV X that is neither discrete nor 
continuous. We can write 


F(x) = 4 Fa(x) + $ F(x), 


where 
0 
F, x = 9 oJ 
d(x) | oo. 
and 
0, x <0, 
F(x) = 4x, O0<x<1l, 
1, l<x. 


Here F4(x) is the DF of the RV degenerate at x = 0, and F,(x) is the DF with PDF 


1, O<x <1, 
Ic(x) = 


0, otherwise. 
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PROBLEMS 2.4 


1. Let 
pe = p(X — py, k=0,1,2,..., O<p<1. 


Does {px} define the PMF of some RV? What is the DF of this RV? If X is an 
RV with PMF {p;}, what is P{n < X < N}, where n, N (N > n) are positive 
integers? 


2. In Problem 2.3.3, find the PDF associated with the DFs of parts (b), (c), and (d). 


3. Does the function f(x) = 62xe~™ if x > 0, and = Oif x < 0, where @ > 0, 
define a PDF? Find the DF associated with fg (x); if X is an RV with PDF fo(x), 
find P{X > 1}. 


4. Does the function foe(x) = {(x + 1)/[A(@ + D]je*/? if x > 0, and = 0 
otherwise, where 0 > 0 define a PDF? Find the corresponding DF. 


5. For what values of K do the following functions define the PMF of some RV? 
(a) f(x) = K(A*/x!),x =0,1,2,...,A>0. 
(b) f(x) = K/N,x =1,2,...,N. 

6. Show that the function 


f@= se Ml, —00 <x <O, 


is a PDF. Find its DF. 
7. For the PDF f(x) = x if0 < x < l,and= 2—xif 1 < x < 2, find 
P{g<X <F). 
8. Which of the following functions are density functions? 
(a) f(x) =x(2—x), 0 <x <2, and 0 elsewhere. 
(b) f(x) = x(2x — 1), 0 < x < 2, and 0 elsewhere. 
(c) f(x) = (1/A) exp{—(x — 0)/A}, x > 6, and 0 elsewhere, 4 > 0. 
(d) f(x) =sinx, 0 <x < 2/2, and 0 elsewhere. 
(e) f(x) = Oforx < 0, = (x + 1)/9 forO < x < 1, = 2(2x — 1)/9 for 
1 <x < 3, = 2(5 —2x)/9 for} <x < 1,= % for2 < x < 5,and0 
elsewhere. 
() f(x) = 1/71 + x7), x ER. 
9. Are the following functions distribution functions? If so, find the corresponding 
density or probability functions. 
(a) F(x) = Oforx <0,=x/2for0<x <1,= + for 1 <x <2,=x/4 for 
2<x <4and = 1 forx > 4. 
(b) F(x) = Oifx < -@,= 5 (x/O + 1) if |x| < 6, and 1 for x > 0 where 
é6>0. 
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(c) F(x) = Oif x < 0, and = 1 — (1+ x) exp(—x) ifx > 0. 
(d) F(x) = Oifx < 1,=(« — 1)?/8 if 1 <x <3,and 1 forx > 3. 
(e) F(x) =Oifx <0,and=1—e-* ifx >0. 
10. Suppose that P(X > x) is given for a random variable X (of the continuous 


type) for all x. How will you find the corresponding density function? In partic- 

ular, find the density function in each of the following cases: 

(a) P(X > x) = lifx < 0, and P(X > x) =e forx > 0;A > Oisa 
constant. 

(b) P(X > x) = 1lifx <0,and=(1+x/d)™, forx > 0, 4 > Ois aconstant. 

(c) P(X > x) =1ifx <0, and = 3/(1 + x)? — 2/(14+ x)’ if x > 0. 

(d) P(X > x) = 1lifx < xo, and = (xo/x)* if x > x9; xo > Oanda > Oare 
constants. 


2.5 FUNCTIONS OF A RANDOM VARIABLE 


Let X be an RV with a known distribution, and let g be a function defined on the real 
line. We seek the distribution of Y = g(X), provided that Y is also an RV. We first 
prove the following result. 


Theorem 1, Let X be an RV defined on (Q,5S, P). Also, let g be a Borel- 
measurable function on R. Then g(X) is also an RV. 


Proof. For y € R, we have 


(g(X) < y} = {X € g7!(—co, yI}, 


and since g is Borel-measurable, g~!(—oo, y] is a Borel set. It follows that {g(X) < 
y) € S, and the proof is complete. 


Theorem 2. Given an RV X with a known DF, the distribution of the RV Y = 
g(X), where g is a Borel-measurable function,.is determined. 


Proof. Indeed, for all y € R, 
(1) P{Y < y} = P{X € g!(—00, y]}. 


In what follows we always assume that the functions under consideration are 
Borel-measurable. 


Example 1. Let X be an RV with DF F. Then |X|, aX + b (where a # O and b 
are constants), X k (where k > 0 is an integer), and |X|* (@ > 0) are all RVs. Define 


xt = X, xX >0, 
~ 10, X <0, 
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and 


Then X*, X~ are also RVs. We have 


P{|X| < y}= P{-y < X < y} = P{X < y}— P([X < —y} 
= F(y) — F(—y) + P{X = —y}, y >; 
P{aX +b < y}= P{ax < y—b} 
y—b 


Pyx< ifa > 0, 
=! a 
plete? ifa <0; 
a 
and 
0 if y < 0, 
P{X*+ <y}= { P{X <0} if y = 0, 
P{X <0}+ P{0< X < y} if y > 0. 
Similarly, 
P(X” <y}= noe 
P{X < y} ify <0. 


Let X be an RV of the discrete type and A be the countable set such that P{X e€ 
A} = land P(X =x} > Oforx € A. Let Y = g(X) bea one-to-one mapping from 
A onto some set B. Then the inverse map, g~!, is a single-valued function of y. To 
find P{Y = y}, we note that 


P{g(X) = y} = P(X =g7!(y)}, eB, 
py =yy= 4 PB yo P{(X = 2 Geen) 
0, ye BS. 
Example 2. Let X be a Poisson RV with PMF 
gat 
P{X =k}= e Ee’ k=0,1,2,...; A> 0, 
0, otherwise. 


Let Y = X2 +3. Then y = x? +3 maps A = (0,1,2,...} onto B = (3, 4,7, 12, 
19, 28, ...}. The inverse map is x = ./y — 3, and since there are no negative values 
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in A, we take the positive square root of y — 3. We have 


e~Ayvy-3 


P{Y = y}= P(X =VJVy—3}= ye B, 
(y- Vo — 3)! 


and P{Y = y} = O elsewhere. 
Actually, the restriction to a single-valued inverse on g is not necessary. If g has a 


finite (or even a countable) number of inverses for each y, from countable additivity 
of P we have 


P{Y = y) = P{g(X) =y)} = P {Ue = a, g(a) = a} 
= )0 P(X =a, g(a) = yh. 
Example 3. Let X be an RV with PMF 


P{X=-2=4, P{X=-l}=}, P{X=O0}=4, 
P{xX=1}=%, and P{x=2Z=H 


Let Y = X*. Then 


= {—2,-1,0,1,2} and B= {0,1,4}. 


We have 
3 y=0, 
P{Y=yy=fiiagii7 ae | 
eo ene. ay 
stays 24 


The case in which X is an RV of the continuous type is not as simple. First we note 
that, if X is a continuous RV and g is some Borel-measurable function, Y = g(X) 
may not be an RV of the continuous type. 


Example 4. Let X be an RV with uniform distribution on ([—1, 1]; that is, the 
PDF of X is f(x) = 5s —1 <x < 1, and = 0 elsewhere. Let Y = X*. Then, from 
Example 1, 


0, y<0O, 

1 

3s y=0, 
P{Y<y}= 4? , 

3+ 35y, 12 y> 0, 

1, >1 
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We see that the DF of Y has a jump at y = 0 and that Y is neither discrete nor 
continuous. Note that all we require is that P{X < 0} > 0 for X*+ to be of the mixed 


type. 


Example 4 shows that we need some conditions on g to ensure that g(X) is also 
an RV of the continuous type whenever X is continuous. This is the case when g 
is a continuous monotonic function. A sufficient condition is given in the following 
theorem. 


Theorem 3. Let X be an RV of the continuous type with PDF f. Let y = g(x) 
be differentiable for all x and either g’(x) > 0 for all x or g’(x) < 0 for all x. Then 
Y = g(X) is also an RV of the continuous type with PDF given by 


d 
leon] Se"). a<y<f, 


0, otherwise, 


(2) h(y) = 


where aw = min{g(—oo), g(+00)} and B = max{g(—oo), g(+00)}. 

Proof. If g is differentiable for all x and g’(x) > 0 for all x, then g is continuous 
and strictly increasing, the limits a, 8 exist (may be infinite), and the inverse function 
x = g7'(y) exists, is strictly increasing, and is differentiable. The DF of Y for 
a <y < Bis given by 

PLY <y}= P(X <g7'()}. 


The PDF of g is obtained on differentiation. We have 
d 
h(y) = ay < y} 
y 
2 d _ 
= fla za '0). 
y 


Similarly, if g’ < 0, then g is strictly decreasing and we have 


PLY < y} = P{X > g7'(y)} 
=1—P{X<g'(y)} _ (X isacontinuous RV) 


so that 
~1 qd 
h(y) = —fle M1: 8 (y). 
y 


1 


Since g and g™ are both strictly decreasing, (d/dy) g!(y) is negative and (2) fol- 


lows. 
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Note that 
fe 'Gye 
dy® dg(x)/dx |, .-Iy) a‘ 
so that (2) may be rewritten as 
x) 
3) AQ) = ,  a<y<B 
|dg(x)/dx\|,-5-1(y) 


Remark 1. The key to computation of the induced distribution of Y = g9(X) 
from the distribution of X is (1). If the conditions of Theorem 3 are satisfied, we 
are able to identify the set {X € g~!(—oo, y]} as (X < g7!(y)} or {X > g7!(y)}, 
according to whether g is increasing or decreasing. In practice, Theorem 3 is quite 
useful, but whenever the conditions are violated, one should return to (1) to compute 
the induced distribution. This is the case, for example, in Examples 7 and 8 and 
Theorem 4 below. 


Remark 2. \f the PDF f of X vanishes outside an interval {a, b] of finite length, 


we need only to assume that g is differentiable in (a, b), and either 9’(x) > 0 or 
g’(x) < 0 throughout the interval. Then we take 


a = min{g(a), g(b)} and £ = max{g(a), g(b)} 
in Theorem 3. 


Example 5. Let X have the density f(x) = 1,0 < x < 1, and = 0 otherwise. 
Let Y = e*. Then X = log Y, and we have 


1 
n= [4], 0 < logy <1, 


that is, 

1 

-, l<y<e, 

h(y)= 4 y 

0, otherwise. 

If y = —2 log x, then x = e~»/? and 
h(y) = |-5e | -1, O<e/ <1, 
se/?, 0<y<ov, 


0, otherwise. 
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Example 6. Let X be a nonnegative RV of the continuous type with PDF f, and 
leta > 0. Let Y = X®. Then 


P{X < yl/a if y > 0, 
P{[X% <y}= {X < y/*} ify 2 
0 if y < 0. 
The PDF of Y is given by 
d 
hoy) = Fy) |a_yll@ 
y 


1 
ed Os y>0, 
0, y <0. 


Example 7. Let X be an RV with PDF 
1 scare 

/20 

Let Y = X?. In this case, g’(x) = 2x, which is > 0 for x > 0, and < 0 for x < 0, so 

that the conditions of Theorem 3 are not satisfied. But for y > 0, 


P{Y <y}=P{-J/y <X < Vy} 
= F(/y) — F(-./y), 
where F is the DF of X. Thus the PDF of Y is given by 


f(x) = 


—0oo <x < Ww. 


1 


h(y) = 4 2V/Y 
0, y <0. 
Thus 
: evr O<y 
h(y) = 4 V2n y 
0, y <0. 
Example 8. Let X be an RV with PDF 

2x 

j= ~t O<x<Z, 
0, otherwise. 


Let Y = sin X. In this case g’(x) = cosx > 0 for x in (0, 2/2) and < 0 for x in 
(1/2, 7), so that the conditions of Theorem 3 are not satisfied. To compute the PDF 
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Fig. 1. y=sinx, O<x <2. 


of Y, we return to (1) and see that (Fig. 1) the DF of Y is given by 


P{Y < y} = P{sinX < y}, O<y<il, 
=P{O<X <x)U@.<X<7)}, 


where x; = sin”! y and x2 = — sin-! y. Thus 


py <= f foods f Flx)dx 


(2) 1-2), 


and the PDF of Y is given by 
d [sinh y ; d nx —sin@! y : 
h(y) = — — | 1-— {| ———— 
z 0 1 
= <y<il, 
={nrJ/l— y? 
0, otherwise. 


In Examples 7 and 8 the function y = g(x) can be written as the sum of two 
monotone functions. We applied Theorem 3 to each of these monotonic summands, 
These two examples are special cases of the following result. 
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Theorem 4. Let X be an RV of the continuous type with PDF f. Let y = g(x) 
be differentiable for all x, and assume that g’ (x) is continuous and nonzero at all but 
a finite number of values of x. Then for every real number y, 


(a) there exist a positive integer n = n(y) and real numbers (inverses) x; (y), 
x2(y),.-- »Xn(y) such that 


gixx(y)]=y and g’[x(y)] 40, k=1,2,...,n), 


or 
(b) there does not exist any x such that g(x) = y, g’(x) # 0, in which case we 
write n(y) = 


Then Y is a continuous RV with PDF given by 


Yo flaQe'beO)l! ifn > 0, 
hy =14 


0 ifn =0. 


Example 9. Let X be an RV with PDF f, and let Y = |X|. Here n(y) = 
x1(y) = y, x2(y) = —y for y > 0, and 


_JfO+FfCy), y>O, 
nor={f y <0. 


Thus, if f(@) = 5s —1<.x <1, and = 0 otherwise, then 


1, O<y<il, 
h(y) = aah 
it otherwise. 
If f@%) = (/V2mje— 2), ~oo < x < oo, then 


2 
h(y) = 4 2a 


0, otherwise. 


e707/2), y>0, 


Example 10, Let X be an RV of the continuous type with PDF f, and let Y = 
X2™, where m is a positive integer. In this case g(x) = x2”, g'(x) = 2mx?"-! > 0 
for x > O and g/(x) < 0 forx < 0. Writing n = 2m, we see that for any y > 0, 
n(y) = 2, x1(y) = —y!/", x2(y) = y!/”. It follows that 


1 1 
A(y) = fla): nytt * S20) a yey 


1 ijn \/ ‘ 
pray y+ fi-y/] ify >0, 
° ify <0. 
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In particular, if f is the PDF given in Example 7, then 


2 2/n 
————_ exp wes if y > 0, 
h(y) = 4 V2rny!-l/n 2 
0 


if y <0. 


Remark 3. The basic formula (1) and the countable additivity of probability al- 
low us to compute the distribution of Y = g(X) in some instances even if g has a 
countable number of inverses. Let A C and g map A into B C R. Suppose that A 
can be represented as a countable union of disjoint sets Ay, k = 1,2,.... Then the 
DF of Y is given by 


P(Y < y} = P{X € g|!(-00, y}} 


=P {x ED I{g7'(—00, y]}N aa 


k=1 


P{X € AN {g~|(—00, y}}}. 


Me 


k 


1 


If the conditions of Theorem 3 are satisfied by the restriction of g to each Ag, we 
may obtain the PDF of Y on differentiating the DF of Y. We remind the reader that 
term-by-term differentiation is permissible if the differentiated series is uniformly 
convergent. 


Example 11. Let X be an RV with PDF 


Ge, x>Q0Q, 
0, x <0, 


fo) =| 6>0. 


Let Y = sin X, and let sin! y be the principal value. Then (Fig. 2) for0 < y <1, 


P{sinX < y} 
= P{0 < X < sin“ y or (Qn — 1)x — sin”! y<X <2nn+sin'y 
for all integers n > 1} 


ie. ¢) 
=P{0OA<X< sin! yy + oS P{(2n — 1) — sin7! y <X <2nn +sin™! y} 


n=1 


[o<) 
at de eG sin” va > (eS y) _ 9-OQnx-+sin7! ») 


n=1 
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Fig. 2. y = sinx, x > 0. 


0° 
sf e? sin”) y A (e976 sin y_ er? sin“! Y) yee 
n=l 


e7 26a 


=|- eG sin” ane (e097 +0 sin“! y_ e? sin! Y) 
1 — e—26x 


ee | Poa | 
e792 +8 sin y — e—Osin y 


= dt 1 — e-276 


A similar computation can be made for y < Q. It follows that the PDF of Y is given 
by 


Oe (1 a e7ry-1 (4 =e y?) 1? (ef sia" yo e-Ox~O sin"! ») if-—l< y< 0, 


A(y) = JOC — €-7™)-"(L = y?)V2(eP sin y 4 g-Oxt9sin™! y) if0<y<1, 
0 otherwise. 
PROBLEMS 2.5 


1, Let X be a random variable with probability mass function 
Pix=ri= ("ora py r=0,1,2,...,2, O<p<1. 
r 


Find the PMFs of the RVs (a) ¥ = aX +b, (b) ¥Y = X*, and(c) Y = VX. 
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2. 


Let X be an RV with PDF 
0 if x <0, 
1 
f@= 5 if0O <x <1, 
1 
X 


Find the PDF of the RV 1/X. 


. Let X be a positive RV of the continuous type with PDF f(-). Find the PDF of 


the RV U = X/(1 + X). If, in particular, X has the PDF 


1, O<x<l, 


0, otherwise, 


ro=| 


what is the PDF of U? 


. Let X be an RV with PDF f defined by Example 11. Let Y = cos X and Z = 


tan X. Find the DFs and PDFs of Y and Z. 
Let X be an RV with PDF 


Ge ifx > 0, 


0 otherwise, 


fo(x) = | 


where 6 > 0. Let ¥ = (X — 1/0). Find the PDF of Y. 


. A point is chosen at random on the circumference of a circle of radius r with 


center at the origin, that is, the polar angle 6 of the point chosen has the PDF 


f@= ae 0e€(-—n,2z). 
2n 


Find the PDF of the abscissa of the point selected. 


. For the RV X of Example 7, find the PDF of the following RVs: (a) Y; = e*, 


(b) ¥2 = 2X? + 1, and (c) ¥3 = g(X), where g(x) = Lifx > 0,= 4 ifx = 0, 
and = —lifx <0. 


Suppose that a projectile is fired at an angle 6 above the earth with a velocity V. 
Assuming that 6 is an RV with PDF 


12 if nu g a4 
— —~<@<-, 

fO@O= 42 6 4 
0 otherwise, 


find the PDF of the range R of the projectile, where R = V7 sin20/g, g being 
the gravitational constant. 
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9. Let X be an RV with PDF f(x) = 1/(27) if 0 < x < 2m, and = 0 otherwise. 
Let Y = sin X. Find the DF and PDF of Y. 


10. Let X be an RV with PDF f(x) = 4 if -1 < x < 2, and = 0 otherwise. Let 
Y = |X|. Find the PDF of Y. 


11, Let X be an RV with PDF f(x) = 1/(20) if -O < x < 6, and = 0 otherwise. 
Let ¥Y = 1/X?. Find the PDF of Y. 

12. Let X be an RV of the continuous type, and let Y = g(X) be defined as follows: 
(a) g(x) = lifx > 0,and= —1 ifx <0. 
(b) g(x) = bifx > b, =x if |x| < b, and = —bif x < —b. 
(c) g(x) = x if |x| = b, and = Oif |x| <b. 
Find the distribution of Y in each case. 


CHAPTER 3 


Moments and Generating Functions 


3.1 INTRODUCTION 


The study of probability distributions of a random variable is essentially the study 
of some numerical characteristics associated with them. These parameters of the 
distribution play a key role in mathematical statistics. In Section 3.2 we introduce 
some of these parameters, namely, moments and order parameters, and investigate 
their properties. In Section 3.3 the idea of generating functions is introduced. In 
particular, we study probability generating functions, moment generating functions, 
and characteristic functions. In Section 3.4 we deal with some moment inequalities. 


3.2 MOMENTS OF A DISTRIBUTION FUNCTION 


In this section we investigate some numerical characteristics, called parameters, as- 
sociated with the distribution of an RV X. These parameters are moments and their 
functions and order parameters. We concentrate mainly on moments and their prop- 
erties. 

Let X be a random variable of the discrete type with probability mass function 
Pe = P{X = xx), k = 1,2,.... If 


ice) 


() IxXk| Pk < 00, 
k=] 


we say that the expected value (or the mean or the mathematical expectation) of X 
exists and write 

co 
(2) w= EX =) reve 


k=1 


Note that the series }°7° , xx px may converge but the series “7°; |xx| px may 
not. In that case we say that EX does not exist. 


69 


70 MOMENTS AND GENERATING FUNCTIONS 


Example 1. Let X have the PMF given by 


413! 2 , 
pa P {x= t .S, 5 a een 


~o 


2 2 
a Ixjlpj = PD = =O, 
jal jad 


and EX does not exist, although the series 


dx = ots 
j=l j=l J 
is convergent. 


If X is of the continuous type and has PDF f, we say that EX exists and equals 
f xf (x) dx, provided that 


[venax < OO. 


A similar definition is given for the mean of any Borel-measurable function A(X) 
of X. Thus if X is of the continuous type and has PDF f, we say that EhA(X) exists 
and equals f h(x) f (x) dx, provided that 


[rcorerax < OO. 


We emphasize that the condition f |x| f(x) dx < oo must be checked before it 
can be concluded that EX exists and equals f xf (x) dx. Moreover, it is worthwhile 
to recall at this point that the integral fos y(x) dx exists, provided that the limit 
lim °° f“, v(x) dx exists. It is quite possible for the limit limg—.oo f", o(x) dx 
to exist without the existence of thea v(x) dx. As an example, consider the Cauchy 
PDF: 


1 1 
ON ag ht —0o < x < 0O. 
Clearly, 
a 
lim | = dx =0. 


a—>oo gu l+x? 


However, E X does not exist since the integral (1/7) [ee Ix|/(. + x2) dx diverges. 
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Remark I. Let X(w) = I4(w) for some A € S. Then EX = P(A). 


Remark 2. If we write h(X) = |X|, we see that EX exists if and only if E|X| 
does. 


Remark 3. We say that an RV X is symmetric about a point a if 
P{X >a+x}= P{X <a-—x} for all x. 
In terms of DF F of X, this means that if 
Fia—x)=1-Fl(a+x)+ P{X =a+x} 


holds for all x € 7, we say that the DF F (or the RV X) is symmetric with a as the 
center of symmetry. If a =-0, then for every x, 


F(—x) = 1— F(x) + P{X =x}. 


In particular, if X is an RV of the continuous type, X is symmetric with center @ if 
and only if the PDF f of X satisfies 


f@—x)= f@+x) for all x. 


If « = 0, we will say simply that X 1s symmetric (or that F is symmetric). 

As an immediate consequence of this definition we see that if X is symmetric with 
a as the center of symmetry and E|X| < ov, then EX = aq. A simple example of a 
symmetric distribution is the Cauchy PDF considered above (before Remark 1). We 
will encounter many such distributions later. 


Remark 4. If a and 6b are constants and X is an RV with E|X]| < ov, then 
ElaX + b] < oo and Ef{aX + b} = aEX + b. In particular, E{X — pw} = 0,a 
fact that should not come as a surprise. 


Remark 5. Sf X is bounded, that is, if P{|X| < M}=1,0< M < oo, then EX 
exists. 


Remark 6. If {X > 0} = 1 and EX exists, then EX > 0. 


Theorem 1. Let X be an RV and g be a Borel-measurable function on R. Let 
Y = g(X). If X is of discrete type, then 


(3) EY =} (aj) P(X = xj} 
j=l 


in the sense that if either side of (3) exists, so does the other, and then the two are 
equal. If X is of continuous type with PDF f, then EY = f g(x) f(x) dx in the 
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sense that if either of the two integrals converges absolutely, so does the other, and 
the two are equal. 


Remark 7. Let X be a discrete RV. Then Theorem 1 says that 


>> ae) PUX = xj} = Do we PLY = ye) 
j=l k=1 


in the sense that if either of the two series converges absolutely, so does the other, 
and the two sums are equal. If X is of the continuous type with PDF f, let h(y) be 
the PDF of Y = g(X). Then, according to Theorem 1, 


/ sie) f()dx = / yh(y) dy, 
provided that E|g(X)| < oo. 


Proof of Theorem 1. In the discrete case, suppose that P(X € A} = 1. Ify = 
g(x) is a one-to-one mapping of A onto some set B, then 


PIY=y}=P{X=e'()}, yes. 
We have . 


do ee) PIX =x} = > yPly = y}. 


xeéA yeB 


In the continuous case, suppose that g satisfies the conditions of Theorem 2.5.3. Then 


B d 
/ g(x) f(x) dx = [ yfls OG 8 Olay 


by changing the variable to y = g(x). Thus 


B 
[ewrerax = yh(y) dy. 


The functions h(x) = x", where n is a positive integer, and h(x) = |x|*, where a 
is a positive real number, are of special importance. If EX" exists for some positive 
integer n, we call EX” the nth moment of (the distribution function of) X about 
the origin. If E|X|* < oo for some positive real number a, we call E|X|* the ath 
absolute moment of X. We shall use the notation 


(4) My, = EX" and By = E|X|* 


whenever the expectations exist. 
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Example 2. Let X have the uniform distribution on the first N natural numbers; 
that is, let 


1 


X=k}=—, k=1,2,...,N. 
P{ } N 
Clearly, moments of all order exist: 
N 
1 N+1 
EX = k-—=—-, 
eee 
N 
1 (N+ DQN +1) 
2_ 22) eee 
EX?=)ok 7 7 
k=1 
Example 3. Let X be an RV with PDF 
2 
“Rs > 1, 
f@=aye *5 
0, x<il 
Then 
CO 
2 
x 
But 


does not exist. Indeed, it is easily possible to construct examples of random variables 
for which all moments of a specified order exist but no higher-order moments do. 


Example 4. Two players, A and B, play a coin-tossing game. A gives B one 
dollar if a head turns up; otherwise, B pays A one dollar. If the probability that the 
coin shows a head is p, find the expected gain of A. 

Let X denote the gain of A. Then 

P{X = 1} = P{tails} = 1 — p, P{X =-l}=p, 


and 


: . 1 
BY Stop pei ee >0 iene 
=0 if and only if p = 5. 


Thus E X = 0 if and only if the coin is fair. 
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Theorem 2. If the moment of order f exists for an RV X, moments of order 0 < 
Ss <ft exist. 


Proof. Let X be of the continuous type with PDF f. We have 


eixt=f outrodr+f  isitreoas 
\x|5<1 |x|[S>1 

< P{\X} < 1} + E[X|' < 00. 
A similar proof can be given when X is a discrete RV. 


Theorem 3. Let X be an RV on a probability space (Q, S, P). Let E|X|* < 00 
for some k > 0. Then 


n* P{|\X|>n}—>0 asn —> OOo. 


Proof. We provide the proof for the case in which X is of the continuous type 
with density f. We have 


oo > f ixit ferydx = lim I lx| f (x) dx. 
n->0O0 Ix|<n 


It follows that 
lim Ix f@)dx > 0 asn—> 0. 
ROO Sixlon 
But 
[wth rends = nt PUX| > nh 
|x|>n 
completing the proof. 


Remark 8. Probabilities of the type P{|X| > n} or either of its components, 
P{X > n} or P{X < —n}, are called tail probabilities. The result of Theorem 3, 
therefore, gives the rate at which P{|X| > n} converges to 0 as n —> oo. 

Remark 9. The converse of Theorem 3 does not hold in general; that is, 

n* P{|X| >n}—> 0 as n — oo for some k 
does not necessarily imply that E|X|* < 00, for consider the RV 


P{xX =n} = N= 235s G 


n2logn’ 
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where c is a constant determined from 


oO 


c 
pass ioe: = 


n=2 


We have 
ewe = = 
P{X > nj} ef io = cn (logn) 


andnP{X > n} — Oasn — oo. (Here and subsequently, ~ means that the ratio of 
two sides > 1 asn — oo.) But 


c 
EX = = 00 
Perr 


In fact, we need 
nk PUX|>n}>0 asn>0 


for some 6 > 0 to ensure that E|X|* < oo. A condition such as this is called a 
moment condition. 


For the proof we need the following lemma. 
Lemma 1. Let X be a nonnegative RV with distribution function F. Then 
co 
(5) Ex= | [1 — F@)] dx, 
0 
in the sense that if either side exists, so does the other and the two are equal. 
Proof. \f X is of the continuous type with density f and EX < oo, then 
oO n 
EX = [ xf(x)dx = lim [ xf (x) dx. 
1) noo 0 
On integration by parts, we obtain 
n n n 
[ xf (x) dx =nF(n) -{ F(x) dx = —n[{1 — F(n)] +f [1 — F(x) dx. 
0 0 0 
But 
co wo 
nfl — F(n)] =nf fQ@)dx < : xf (x) dx, 
n n 


and since E|X| < oo, it follows that 


nfl -— F(a] > 0 asn —> OO. 
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We have 


n 


EX = lim [ xf(x)dx = lim fu — F(x)]dx =| {1 — F(x) dx. 
n->0O 0 noo i) 0 
If fo [1 — F(x)] dx < 00, then 


[steoax s [Pu rons < [ [1 — F(x)]dx, 
0 0 0 


and it follows that E|X| < co. 
We leave the reader to complete the proof in the discrete case. 


Corollary 1. For any RV X, E|X| < oo if and only if the integrals ines P{X < 
x}dx and ie P{X > x} dx both converge, and in that case 


fore) 0 
EX -/ P{X > x}dx -| P{X <x} dx. 
0 


—o 


Actually, we can get a little more out of Lemma 1 than the corollary above. In 
fact, 


CO [oe 
E|X|* =I P{|X|* > x}dx =a f x*-| Pix] > x} dx, 
0 0 


and we see that an RV X possesses an absolute moment of order a > 0 if and only if 
|x|*-! P{|X| > x} is integrable over (0, 00). 
A simple application of the integral test leads to the following moments lemma. 


Lemma 2 


(6) E|X|" < 00  }> P{|X| > n!/%} < 0. 


n=1 


Note that an immediate consequence of Lemma 2 is Theorem 3. We are now ready 
to prove the following result. 


Theorem 4. Let X be an RV with a distribution satisfying n* P{|X| > n} > 0 
as n — oo for some a > 0. Then E|X|P < 00 for0 < B <a. 


Proof. Given ¢ > 0, we can choose an N = N(e) such that 
P{|X|>n}<—  foralln>N. 
ne 


It follows that for 0 < B < a, 
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N foe) 
E|x|P = pf xP-| PUX| > x}dx + pf xP! PUX| > x}dx 
0 N 


[oe] 

< NP +pe | xB-a-! gy 
N 

< oO. 


Remark 10. Using Theorems 3 and 4, we demonstrate the existence of random 
variables for which moments of any order do not exist, that is, for which E|X|* = 00 
for every a > 0. For such an RV n® P{|X| > n} + O0asn — oo for anya > 0. 
Consider, for example, the RV X with PDF 


1 


—————; fe 
foy= 4 2xidogep? Ire 
0 otherwise. 
The DF of X is given by 
: ifx < —e 
2 log |x| as 
1 
F(x) = 5 if—e<x<e, 
1 
- D165 ifx>e. 
x 


Then for x > e, 


P{|X| > x} = 1— F@) + F(-x) 
= 1 
~ 2logx’ 


and x% P{|X| > x} — oo as x — oo for anya > 0. It follows that E|X|* = oo for 
every a > 0. In this example we see that P{|X| > cx}/P{|X| > x} ~ lasx —- co 
for every c > 0. A positive function L(-) defined on (0, 00) is said to be a function of 
slow variation if and only if L(cx)/L(x) — 1 as x — oo for every c > 0. For such 
a function x* L(x) — oo for every a > 0 (see Feller (23, pp. 275—279]). It follows 
that if P{|X| > x} is slowly varying, E|X|* = oo for every a > 0. Functions of 
slow variation play an important role in the theory of probability. 


Random variables for which P{|X| > x} is slowly varying are clearly excluded 
from the domain of the following result. 


Theorem 5. Let X be an RV satisfying 


P{|X 
(7) te 8 asx > co forallce > 1; 
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then X possesses moments of all orders. [Note that if c = 1, the limit in (7) is 1, 
whereas if c < 1, the limit will not go to 0 since P{|X| > cx} => P{|X] > x}.] 


Proof. Lete > 0 (we will choose ¢ later), choose xo so large that 


P{|X| > cx} 
8 — for all x > xo, 
(8) PiiX|> x) <€ or all x > xo 


and choose x; so large that 
(9) P{|X|>x} <e for all x > x1. 
Let N = max(xo, x;). We have for a fixed positive integer r, 


PUX|> cx} — py P{IX| > c?x} ; 
Gg PUxX|>a) Il PIX) > cP-ix) ~° 


for x > N. Thus for x > N we have, in view of (9), 
(11) PIX] > clx} <6"), 


Next note that for any fixed positive integer n, 
oo 
(12) —E|X/" anf x"! PULX| > x}dx 
0 
N fore] 
=n x"! Ptx| > xhdx tn f x"! PULX| > x} dx. 
0 N 


Since the first integral in (12) is finite, we need only show that the second integral is 
also finite. We have 


oO 
i eins side Sf x"! PUX| > x}dx 
N r 


CO 
< Sic" Nye" 2c" N 


r=1 
fo) 
=2N" diecy 
r=1 


= 2N" 


< 00, 
1 — ec" 


provided that we choose ¢ such that ec” < 1. It follows that E|X|" < oo forn = 
1,2,.... Actually, we have shown that (7) implies that E|X ( < oo for all 5 > 0. 
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Theorem 6. If i, 42,...,, are Borel-measurable functions of an RV X 
and Eh;(X) exists fori = 1,2,...,n, then E [Sri hj(X)] exists and equals 
Det Ehi(X). 


Definition 1. Let k be a positive integer and c be a constant. If E(X — c)* exists, 
we call it the moment of order k about the point c. If we take c = EX = pt, which 
exists since E|X| < 00, we call E(X — w)* the central moment of order k or the 
moment of order k about the mean. We shall write 


be = E(X — pk 


If we know m,,m2,... , mx, we can compute j11, 142,... , Ue, and conversely. 
We have 


; k k 
(13) eg = E(X — pw) = me - iam + 


em —--+(-D* uk 


and 


k k 
(14) my = E(X—wt py = pet ({)aia-s + (; 


usa teeth 
The case k = 2 is of special importance. 

Definition 2. If EX exists, we call E(X — yw)” the variance of X, and we write 
o? = var(X) = E(X — y)*. The quantity o is called the standard deviation (SD) 
of X. 

From Theorem 6 we see that 
(15) o? = py = EX? —(EXx)*. 

Variance has some important properties. 

Theorem 7. Var(X) = 0 if and only if X is degenerate. 

Theorem 8. Var(X) < E(X — c)* for any c # EX. 

Proof. We have 

var(X) = E(X — w)* = E(X —c)? + (c—p)?. 
Note that 
var(aX +b) = a? var(X). 


Let E\x/? < oo. Then we define 
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X—EX X-u 
am 
Ce) /var(X) a 


and see that EZ = O and var(Z) = 1. We call Z a standardized RV. 


Example 5. Let X be an RV with binomial PMF 
P{X =kh= (;) ota —p)"* -k=0,1,2,...,0; O<p<l. 


Then 
s(n k k 
EX= k (i — p)"— 
DH(i)Aa-P 


n—-1 _ ae 
=D (t- i) "1 — py"* 
= np; 


EX? = E[X(X -1)+ X] 
= kk v(p)eta — py" + np 


= n(n —1)p* + np; 
var(X) = n(n — 1)p? +np- n? p* 
= np(1 — p); 
EX? = E[X(X — 1)(X¥ — 2) + 3X(X¥ — 1) + X] 
=n(n—1)(n— 2) p? + 3n(n — 1) p” +np; 


and 


13 = m3 — 3um2 + 2? 
=n(n—1)\(n— 2)p? + 3n(n — 1)p? + np — 3np[n(n — 1)p? +np)+ 2n? p? 
= np(1 — p)(1 — 2p). 


In the example above we computed factorial moments EX(X — 1)(X — 2)--- 
(X —k +1) for various values of k. For some discrete integer-valued RVs whose PMF 
contains factorials or binomial coefficients, it may be more convenient to compute 
factorial moments. 

We have seen that for some distributions, even the mean does not exist. We next 
consider some parameters, called order parameters, which always exist. 
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f(x) 
0 3p 1 
Fig. 1. Quantile of order p. 
Definition 3. A number x (Fig. 1) satisfying 
(17) P{X <x}> p, P{xX>x}>1-p, O<p<l, 


is called a quantile of order p [or (100p)th percentile] for the RV X (or for the DF 
F of X). We write 3 p(X) for a quantile of order p for the RV X. 


If x is a quantile of order p for an RV X with DF F, then 
(18) ps F(x) <pt+P{X =x}. 


If P{X = x} = 0, as is the case—in particular, if X is of the continuous type—a 
quantile of order p is a solution of the equation 


(19) F(x) = p. 

If F is strictly increasing, (19) has a unique solution. Otherwise (Fig. 2), there may 

be many (even uncountably many) solutions of (19), each of which is then called a 

quantile of order p. Quantiles are of great deal of interest in testing hypotheses. 
Definition 4. Let X be an RV with DF F. A number x satisfying 


(20) 3S F(R) <4 + PIX =} 


or, equivalently, 
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(a) 


(b) 0 1 x 


Fig. 2. (a) Unique quantile; (b) infinitely many solutions of F(x) = p. 


(21) P{X<x}>4 and P{X>x}>} 
is called a median of X (or F). 


Again we note that there may be many values that satisfy (20) or (21). Thus a 
median is not necessarily unique. 

If F is a symmetric DF, the center of symmetry is clearly the median of the DF F. 
The median is an important centering constant, especially in cases where the mean 
of the distribution does not exist. 
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Example 6. Let X be an RV with Cauchy PDF 


—-0O <X< OW. 


f@= 


Then E|X|.is not finite, but E|x\? < oo for 0 < 5 < 1. The median of the RV X is 
clearly x = 0. 


Example 7, Let X be an RV with PMF 


P{X =—2}=P{X=Oj=4, P{X=1}=5, P{X ==}. 


P{X <O)=4 and P{X>0}=}>}. 
In fact, if x is any number such that 0 < x < 1, then 

P(X <x} = P{X =-2}+ P{X =0} =} 
and 

P{X >x} = P(X = 1} + P{X =2} = 5, 


and it follows that every x, 0 <x < 1, is a median of the RV X. 
If p = 0.2, the quantile of order p is x = —2, since 


P{X<-2}=4>p and P{X>-2}=1>1-p. 


PROBLEMS 3.2 


1. Find the expected number of throws of a fair die until a 6 is obtained. 


2. From a box containing N identical tickets numbered 1 through N, n tickets are 
drawn with replacement. Let X be the largest number drawn. Find EX. 


3. Let X be an RV with PDF 


f(x)= 


ae —-w<x<c, m>t1, 
x 


where c = Pim) /{0G)0n - s1. Show that EX?" exists if and only if 2r < 
2m — 1. What is EX?" if 2r < 2m — 1? 


4, Let X be an RV with PDF 
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kak 


f(x) = 4 (x +a)! 
0 otherwise (a > 0). 


if x > 0, 


Show that E|X{|* < co fora < k. Find the quantile of order p for the RV X. 


5. Let X be an RV such that E|X| < oo. Show that E|X — c| is minimized if we 
choose c equal to the median of the distribution of X. 


6. Pareto’s distribution with parameters a and B (both a and £ positive) is defined 
by the PDF 


Ba ; 
faay= pet SEO 
0 ifx <a. 


Show that the moment of order n exists if and only ifn < 6. Let 6B > 2. Find 
the mean and the variance of the distribution. 


7. For an RV X with PDF 


5x if0<x <1, 
f@)=45 ifl<x <2, 
1G-x)  if2<x <3, 


show that moments of all order exist. Find the mean and the variance of X. 


8. For the PMF of Example 5, show that 
EX‘ = np + 7n(n — 1)p? + 6n(n — 1)(n — 2) p*? +0(n — In — 2)(n — 3) p* 
and 
14 = 3(npq)? + npq(l — 6pq), 


where O < p<1l,q=1-p. 
9, For the Poisson RV X with PMF 
x 


rt 
P{X =x} =e%—, x=0,1,2,..., 
x! 


show that EX =A, EX? =A+4+A2, EX? =A 43027403, EX4 =A4 TA24 
63 +4, and 2 = 3 =A, Wg =A4 302. 
10. For any RV X with E|X|* < 00, define 


M3 iA 


3 = ——, a4 = —>. 
(42)3/? 1 
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Here a3 is known as the coefficient of skewness and is sometimes used as a 
measure of asymmetry, and a4 is known as kurtosis and is used to measure the 
peakedness (“flatness of the top”) of a distribution. Compute a3 and a4 for the 
PMEFs of Problems 8 and 9. 


11. For a positive RV X define the negative moment of order n by EX", where 
n > Ois an integer. Find E[1/(X + 1)] for the PMFs of Example 5 and Prob- 
lem 9. 


12. Prove Theorem 6. 
13. Prove Theorem 7. 
14. In each of the following cases, compute EX, var(X), and EX” (for n > 0, an 
integer) whenever they exist: 
(a) f(x) =1, —4 <x< 5s and zero elsewhere. 
(b) f(x) =e"*, x = 0, and zero elsewhere. 
(c) fx) =(k—- 1)/x* , x > 1, and zero elsewhere; k > 1 is a constant. 
(d) f(x) = 1/[x(1 + x?)], —00 < x < 00. 
(e) f(x) = 6x(1 — x), 0 < x < 1, and zero elsewhere. 
(f) f(x) = xe~*, x > O, and zero elsewhere. 
(g) P(X =x) = pu—- py, x = 1,2,..., and zero elsewhere: 0 < p <1. 
15. Find the quantile of order p(O < p < 1) for the following distributions. 
(a) f@)= 1/x?, x > 1, and zero elsewhere. 
(b) f(x) = 2x exp(—x7), x > 0, and zero otherwise. 
(c) f(x) = 1/6, 0 < x <4@, and zero elsewhere. 
(d) P(X =x) =0(1 —6)*~!, x =1,2,..., and zero otherwise; 0 < @ < 1. 
(e) f(x) = (1/B*)x exp(—x/B), x > 0, and zero otherwise; 6 > 0. 
(f) f(x) = (3/b?)(b — x)”, 0 < x < b, and zero elsewhere. 


3.3 GENERATING FUNCTIONS 


In this section we consider some functions that generate probabilities or moments 
of an RV. The simplest type of generating function in probability theory is the one 
associated with integer-valued RVs. Let X be an RV, and let 


Pe = P{X =k}, k=0,1,2,... 


Definition 1. The function defined by 


(4) P(s)= D> ms*, 
k=0 
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which surely converges for |s| < 1, is called the probability generating function 
(PGF) of X. 


Example 1. Consider the Poisson RV with PMF 


ak 
PIX =ky=e"*—, k=0,1,2,.... 


We have 


00 en 
P(s)= a are = ere = eMI-s) forall s. 


Example 2. Let X be an RV with geometric distribution, that is, let 
P{X=k}=pq*, k=0,1,2,...: O<p<1l, q=1-p. 


Then 


co 

1 

P(s) =) > s* pgk = p——, si <1. 
re 1-—sq 


Remark I. Since P(1) = 1, series (1) is uniformly and absolutely convergent in 
|s| < 1 and the PGF P is a continuous function of s. It determines the PGF uniquely, 
since P(s) can be represented in a unique manner as a power series. 


Remark 2. Since a power series with radius of convergence r can be differenti- 
ated termwise any number of times in (—r, r), it follows that 


P®(s) = Yintn —1)---(n-k+1)P(X =n)s"*, 
n=k 


where P“ js the kth derivative of P. The series converges at least for ~1 < s < 1. 
For s = 1 the right side reduces formally to E[X(X — 1)...(X —k 4 1)], which 
is the kth factorial moment of X whenever it exists. In particular, if EX < oo, 
then P’(1) = EX, and if EX? < oo, then P”(1) = EX(X — 1) and var(X) = 
EX? — (EX)? = P"(1) —[P\(DP + P'(D. 


-ACL 


Example 3. In Example 1 we found that P(s) = e —8) |s| < 1, for a Poisson 


RV. Thus 


P'(s) =r 40-9), 


P’(s) = Re ls) 


Also, EX = A, E(X*—X) = 22, so that var(X) = EX?—(EX)? = 2242-22 =2. 
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In Example 2 we computed P(s) = p/(1 — sq), so that 


2 
Pq n 2pq 

P'(s) = ———— > d P = ————. 

Olesya 

Thus 
Ipq2 2 
Ex = 4, Ex?= 44°70 and var(x) = 44-2. 
P pp pe pp 
Example 4, Consider the PGF 
1 n 

P= ( xf). —0O0 <5 <0O. 


Expanding the right side into a power series, we get 
n 1 n k n 
P(s) = y — He a Y  pes*, 
” ed ({): k=0 a 


and it follows that 
a — pe n Ft pee, 
Pix = b= pe= (2) /2" a ee 


We note that the PGF, being defined only for discrete integer-valued RVs, has limited 
utility. We next consider a generating function that is quite useful in probability and 
statistics. 


Definition 2. Let X be an RV defined on (Q, S, P). The function 
(2) M(s) = Ee’* 


is known as the moment generating function (MGF) of the RV X if the expectation 
on the right side of (2) exists in some neighborhood of the origin. 


Example 5. Let X have the PMF 


6 1 
foyafae pe Fah 
0, otherwise. 


Then (1/27) )°72., e%*/k?, is infinite for every s > 0. We see that the MGF of X 
does not exist. In fact, EX = oo. 


Example 6. Let X have the PDF 


1 -x/2 
f= ger, x > 0, 
0, otherwise. 
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Then 


1 oo 
M(s) = sf eS -1/2)% dy 


0 
1 - 1 
= a S = 
1 —2s 2 
Example 7. Let X have the PMF 
ak 
—rA* —= 
P{X =kj= e ki’ k=0,1,2,..., 
0, otherwise. 


Then 
oo gk 
M(s) = Ee’* =e* yet 


=e -e) for all s. 
The following result will be quite useful subsequently. 


Theorem 1. The MGF uniquely determines a DF and, conversely, if the MGF 
exists, it is unique. 


For the proof we refer the reader to Widder [116, p. 460], or Curtiss [18]. Theo- 
rem 2 explains why we call M(s) an MGF. 


Theorem 2. If the MGF M(s) of an RV X exists for s in (—so, so), say, So > O, 
the derivatives of all order exist at s = 0 and can be evaluated under the integral sign, 
that is, 


(3) M“(s)| _. = EX* for positive integral k. 


For the proof of Theorem 2, we refer to Widder [116, pp. 446-447]. See also 
Problem 9. 


Remark 3. Alternatively, if the MGF M(s) exists for s in (—so, so), say, so > 0, 
one can express M(s) (uniquely) in a Maclaurin series expansion: 


(4) Mis) = MO) + sO +... 


so that E.X* is the coefficient of s*/k! in expansion (4). 


Example 8. Let X be an RV with PDF f(x) = 3e7*/?, x > 0. From Example 6, 
M(s) = 1/(1 — 2s) for s < 4. Thus 
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2 4 me 2 
M' Sy ae a Hr = —_——_., 
(s) = (—2sy2 and M’'(s) (— 253 S< 


Nol 


It follows that 
EX=2, EX?=8, and var(X)=4. 


Example 9. Let X be an RV with PDF f(x) = 1,0 <x < 1, and = 0 otherwise. 
Then 


Ay 


, e—1 
M(s) = [ e* dx = ; all s, 
0 s 


e*-s—(e®—1)-1 


M'(s) = ‘ 
(s) 2 


and 


se’—e*+1 1 
EX = M’(0) = lim ———_———_ = -. 
(0) 0 st 2 


We emphasize that the expectation Ee** does not exist unless s is carefully re- 
stricted. In fact, the requirement that M(s) exists in a neighborhood of zero is a 


very strong requirement that is not satisfied by some common distributions. We next 
consider a generating function that exists for all distributions. 


Definition 3. Let X be an RV. The complex-valued function ¢ defined on R by 
o(t) = E(e!'*) = E(costX) +iE(sintX), teR 


where i = ./—1 is the imaginary unit, is called the characteristic function (CF) of 
RV X. 
Clearly, 


o(t) = > (cos txz + isintxg)P(X = xp) 
k 


in the discrete case, and 


oo= costxf(x)dx +i f sintx f(x) dx 


—o0 —0o 


in the continuous case. 
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Example 10. Let X be a normal RV with PDF 


f@= : ex ae xER 
Se Nay 


Then 


P(t) : [- costx ew* 2 dx + : [ sintx e~? /2d 
= x ———_ mix e Xe 
V2 Joo V20 J—oo 


. . . y 2 2 
Note that sintx is an odd function and so also is sintx e~* /*. Thus the second 
integral on the right side vanishes and we have 


o(t) : im costx e~* /2d 
= OS TX é@ x 
VJ 20 —co 


__2 is BPR ys saps 
=— costx e dx =e A teR. 

V2n J-co 

Remark 4. Unlike an MGF that may not exist for some distributions, a CF al- 
ways exists, which makes it a much more convenient tool. In fact, it is easy to see 
that @ is continuous on Fe, |@(t)| < 1 for all t, and @(—1t) = o(t) where ¢ is the 
complex conjugate of p. Thus ¢ is the CF of —X. Moreover, ¢ uniquely determines 
the DF of RV X. For these and many other properties of characteristic functions, we 
need a comprehensive knowledge of complex variable theory, well beyond the scope 
of this book. We refer the reader to Lukacs [68]. 


Finally, we consider the problem of characterizing a distribution from its mo- 
ments. Given a set of constants {uo = 1, 444, 142, ...}, the problem of moments asks 
if they can be moments of a distribution function F. At this point it will be worth- 
while to take note of some facts. 

First, we have seen that if the M(s) = Ee’* exists for some X for s in some 
neighborhood of zero, then E|X|" < oo for all n > 1. Suppose, however, that 
E|X|" < 00 for all n > 1. It does not follow that the MGF of X exists. 


Example 11. Let X be an RV with PDF 
f(x) =ce7 hl O<a<1, -w<x<o, 


where c is a constant determined from 


of or 
cf el dx = 1. 
—0O 
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4 sx xe a x(s—x°-!) 
eve dx = e dx 
0 0 


and since a — 1 < 0, hs s**e-*" dx is not finite for any s > 0. Hence the MGF 
does not exist. But 


Let s > 0. Then 


foe) co 
E|xX|" = cf [x[tew PE dx = 2 f x"e* dx < 00 for each n, 


as is easily checked by substituting y = x. 
Second, two (or more) RVs may have the same set of moments. 
Example 12. Let X have lognormal PDF 
f@= (eV 2a) Ne 8 x)?/2. x>O0, 
and f(x) = O for x < 0. Let X¢, Je] < 1, have PDF 
fe(x) = f(x)[1 + € sin(2z log x)], xER. 


{Note that f, > 0 for all ¢, je] < 1, and pine fe(x) dx = 1,80 fe is a PDF] Since, 
however, 


oO 1 Oo 2 
k : _ —(t?/2)+kt os 
x" f(x) sinQ2x log x) dx = ——= / e sin(21t) dt 
i; V2n J—oo 


1 2p [~ yp 
= —e/ e~» /* sin(2sry) dy 


V 20 —0o 


we see that 


foe) fe e) 
I x* f(x) dx =| x* f(x) dx 
0 0 


for all e, |e| < 1, andk =0,1,2,.... But f(x) 4 fe(x). 

Third, moments of any RV X necessarily satisfy certain conditions. For example, 
if By = E|X|”, we will see (Theorem 3.4.3) that (6,)!/" is an increasing function 
of v. Similarly, the quadratic form 


; 2 
E (Sx) >0 
i=] 


yields a relation between moments of various orders of X. 
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The following result, which we do not prove here, gives a sufficient condition for 
unique determination of F from its moments. 


Theorem 3. Let {77;,} be the moment sequence of an RV X. If the series 


fore) mk 4 
(5) ae 
converges absolutely for some s > 0, then {m,} uniquely determines the DF F of X. 
Example 13. Suppose that X has PDF 


f(x) =e forx >0O and =Oforx <0. 


Then EX* = i x*e—* dx = k!, and from Theorem 3, 


0 < s < 1, which is the MGF of X. 
In particular, if for some constant c, 
img < c*, k=1,2,..., 


then 


co oo k 
my cs 
y mil ok y kes) <e® fors > 0, 
! i k! 


and the DF of X is determined uniquely. Thus if P{|X| < 
then all moments of X exist, satisfying |m,| < ck, k > 
determined uniquely from its moments. 

Finally, we mention some sufficient conditions for a moment sequence to deter- 
mine a unique DF. 


1 for some c > 0, 


c} = 
1, and the DF of X is 


(i) The range of the RV is finite. 
(ii) (Carleman) )77°. , (m2) '/** = 00 when the range of the RV is (—00, 00). 
If the range is (0, 00), a sufficient condition is "7° , mz) '/** = 00, 
(iii) Limy—soo[ (m2) !/2" /2n] is finite. 
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PROBLEMS 3.3 


1. 


Find the PGF of the RVs with the following PMFs: 
(a) P{X =k} = ({)era ~ p)"*,k=0,1,2,...,0<p<l. 


(b) P{X =k} =[e4/( —e)JA*/k), k =1,2,...;4 > 0. 
(c) P{X =k} = pq¥(1—q®t!)"!",k =0,1,2,...,N;0<p<lgqg=I1-—p. 


Let X be an integer-valued RV with PGF P(s). Let a and B be nonnegative 
integers, and write Y = aX + b. Find the PGF of Y. 


Let X be an integer-valued RV with PGF P(s), and suppose that the MGF 
M(s) exists for s € (—so, So), 89 > 0. How are M(s) and P(s) related? Using 
M)(s)|,9 = EX* for positive integral k, find E X* in terms of the derivatives 
of P(s) for values of k = 1, 2, 3, 4. 


. For the Cauchy PDF 


1 1 
oo Seas ae —-wO <x < ©, 
does the MGF exist? 
Let X be an RV with PMF 


P{X = j} = pj, j=0,1,2,.... 


Set P{X > j}= qj, j =0,1,2,.... Clearly, qj = pjzit+ pjs2t-::, jf 20. 
Write O(s) = 0 qjs/. Then the series for Q(s) converges in |s| < 1. Show 
that 


Os) = ae) for |s| < 1, 
l-s 


where P(s) is the PGF of X. Find the mean and the variance of X (when they 
exist) in terms of Q and its derivatives. 


For the PMF 


a0) 

P{X = j)= 27, 
f (9) 

where a; > 0 and f(6) = 5752.9 a;0/, find the PGF and the MGF in terms of 
Ae 


j=0,1,2,..., @>0, 


. For the Laplace PDF 


i 
(Waser ee. —oo<x<00; A>O, -cO<p<0~, 
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show that the MGF exists and equals 


1 
M(t) = (1 — A720?) Fe, It] < e 


. For any integer-valued RV X, show that 


ie,9) 
Dos" PIX <n} = (1-5) PGs), 


n=0 


where P is the PGF of X. 


. Let X be an RV with MGF M(t), which exists for t € (—t, fo), to > 0. Show 


that 
E|X|" < nts "[M(s) + M(—s)] 


for any fixed s,0 < s < fo, and for each integer n > 1. Expanding e’* ina 
power series, show that for tf € (—s,s5),0 <s < {, 


Co EX" 
M(t) = t” : 
(t) a a 


{Since a power series can be differentiated term by term within the interval of 
convergence, it follows that for |t| < s, 
M®(t)|.20 = EX* 
for each integer k > 1.] (Roy, LePage, and Moore [93]] 
Let X be an integer-valued random variable with 


n 

! j = 

fae Aya a(?) ifk =0,1,2,...,n 
0 ifk >n. 


Show that X must be degenerate at n. [Hint: Prove and use the fact that if EX* < 
oo for all k, then 


 (s ~ 1 
P(s)=)° pT EIA — D(X K+ DL 
k=0 5 


Write P(s) as 


‘oo foe) k 
P(s) = Yo P(X = k)s* = P(X =H — 
k=0 k=0 i=0 
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=)-ni >> (j)ea =k). 
i=0 kai \! 
11. Let p(n, k) = f(n, k)/n! where f (n, k) is given by 


fmt lk)=fak)+ fak-D+---+ fa,k—-n) 


fork =0,1,....(5) and 


f(n,k) =0 fork < 0, fa, 9) = 1, fC, k) = 0 otherwise. 


Let 
] 0° 
Pa(s) = =) s* fin, k) 
k=0 
be the probability generating function of p(n, k). Show that 
n 1s k 
P(s)=(n'"'TJ—~, fs <1. 
ra} ls 


(P, is the generating function of Kendall’s t-statistic.) 
12. Fork =0,1,..., (3). let u,(k) be defined recursively by 


Un(k) = un_1(k —n) + un-1(k) 


with ug(0) = 1, up(k) = 0 otherwise and u,(k) = 0 fork < 0. Let P,(s) = 
yo S*tn(k) be the generating function of {u,}. Show that 


n 
P,(s)=[[U+s/) — for|s| <1. 
j=l 


If pn(k) = un(k)/2”, find {pn(k)} forn = 2, 3, 4. (P, is the generating function 
of the one-sample Wilcoxon test statistic.) 


3.4 SOME MOMENT INEQUALITIES 
In this section we derive some inequalities for moments of an RV. The main result of 
this section is Theorem 1 (and its corollary), which gives a bound for tail probability 


in terms of some moment of the random variable. 


Theorem 1. Let :(X) be a nonnegative Borel-measurable function of an RV X. 
If Eh(X) exists, then for every ¢ > 0, 
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(i) P{h(X) =e} < em 


Proof. We prove the result when X is discrete. Let P(X = xx} = px, k = 
1,2,.... Then 


Eh(X) =) AG) Pe 
k 


= bs + y) h(xk) Pk, 
A AC 


where 
A = {k: h(xg) > &}. 
Then 
Eh(X) >=) h(xe) pe = © Pe 
= ae > e}. ' 


Corollary. Let h(X) = |X|" and e = K", wherer > O and K > 0. Then 


E\x|" 


(2) P{IX|2 K) = 


which is Markov’s inequality. In particular, if we take h(X) = (X — p)*, € = Ko”, 
we get Chebychev—-Bienayme inequality: 


1 
(3) P{|X — pl 2 Ko) < F5, 


where EX = p, var(X) = 07. 


Remark 1. The inequality (3) is generally attributed to Chebychev, although re- 
cent research has shown that credit should also go to I. J. Bienayme. 


Remark 2. If we wish to be consistent with our definition of a DF as Fy(x) = 
P(X <x), then we may want to reformulate (1) in the following form: 


Eh(X) 
3 


P{h(X) > e} < 


For RVs with finite second-order moments, one cannot do better than the inequal- 


ity in (3). 
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Example 1 
1 
P{xX =0}=1--—> 
K2 
1 K > 1, constant, 
P{X =+1) = — 
{ #1} aK? 
1 1 
ea an peeci = — 
EX =0, EX = c=] 
and 
1 
P{|X| = Ko} = P(|xX| >= = ron 


so that equality is achieved. 


Example 2. Let X be distributed with PDF f(x) = 1if0 <x < 1, and=0 
otherwise. Then 


1 2 1 1 
xXx=- xX = oS => - SSS, 
E 7% E 3 var(X) 12 
and 
1 1 1 1 1 
Py|x-- 2,/—-} = Pye -——e< Kc - t+ f= 
{ 2 = | {3 A . +, ; 


From Chebychev’s inequality 


1 1 1 
PIX — =| <2/— Bo 695, 
1 5|< mf ri 0.75 


In Fig. 1 we compare the upper bound for P{|X — I > k/V12} with the exact 
probability. 


It is possible to improve upon Chebychev’s inequality, at least in some cases, if 
we assume the existence of higher-order moments. We need the following lemma. 


Lemma 1. Let X be an RV with EX = 0 and var(X) = o%. Then 


2 
o . 
(4) P{X >x}< eS if x > 0, 
and 
x2 
(5) P{X >x}>——; ifx <0. 
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Upper bound 


0 1 v3 k 
Fig. 1. Chebychev upper bound versus exact probability. 


Proof, Leth(t) = (t +c)*, c > 0. Then A(t) > 0 for all t and 


h(t) > (x +c)? fort >x > 0. 


It follows that 
(6) P{X > x} < P{h(X) > (x +0)?} 
z 
eee forallec>0, x>0. 
(x +c)? 


Since EX = 0, EX? = o%, and the right side of (6) is minimum when c = o*/x. 
We have 


2 


PIX >x}< “x >0. 


o? + x?’ 


A similar proof holds for (5). 
Remark 3. Inequalities (4) and (5) cannot be improved (Problem 3). 


Theorem 2. Let E|X|* < 00, and let EX = 0, EX? = 07. Then 


4 
w4—-—o 
(7) PIIX| = Ko} s Taga — R264 for K > I, 


where 4 = EX‘. 
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Proof. For the proof, let us substitute (X? — 02)/(K*o” — 07) for X and take 
x = 1 in (4). Then 


X? — 0°) /(K?o? — 0°)I 
Dig he peat coy. -Varll 
BIRT Oe SRO NS Fs wa (P= OK 262 
pa — ot 
o4(K? — 1)? + u4 — 04 
4 
4-0 
Sey 
ja 404K — 2K 204 . 


as asserted. 


Remark 4. Bound (7) is better than bound (3) if K? > pg/o4 and worse if 
1 < K? < 4/04 (Problem 5). 


Example 3. Let X have the uniform density 


1 if0<x <1, 
f@) = i otherwise. 
Then 
Ex=! ee ae E{ Xx : Fa 
roa a) ae a 2) ~ 30° 
and 
A eecee) Oe 
{le s]2-/e] <p es 
that is, 


1 [1 45 


which is much better than the bound given by Chebychev’s inequality (Example 2). 


Theorem 3 (Lyapunov Inequality). Let 6, = E|X\" < oo. Then for arbitrary 
k,2 <k <n, we have 


(8) BUG-D < pik, 
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Proof. Consider the quadratic form: 


fe @) 
Qu, v) = / (ulx(®—Y/? + vjx({@t/2)? ¢ (x) dx, 
—-0o 


where we have assumed that X is continuous with PDF f. We have 


Qu, v) = u? By) + 2uvPk + Beriv?. 


Clearly, Q > 0 for all u, v real. It follows that 


a Ben [22 
implying that 
Be < Be Bes- 
Thus 
BE < BOB), = B2 BIBS... Ba <BR BR 


where Bo = 1. Multiplying successive k — 1 of these, we have 
PyShe or BL SR 
It follows that 
Br = By”? < By? <--- < Bal”. 
The equality holds if and only if 


1/k 1/(k+1 
Be = BLP fork =1,2,...; 


that is, (By! Ry is a constant sequence of numbers, which happens if and only if |X| is 
degenerate; that is, for some c, P{|X| = c} = 1. 


PROBLEMS 3.4 


1. For the RV with PDF 


e7*x4 


A! 


fa N= 


SOME MOMENT INEQUALITIES 101 


where 1 > Ois an integer, show that 


a 
P{0< X <2A+1)} > ——. 
aA 20> 


Let X be any RV, and suppose that the MGF of X, M(t) = Ee", exists for every 
t > 0. Then for any t > 0, 


PUtX > s?+logM(t)} <e-™. 


3. Construct an example to show that inequalities (4) and (5) cannot be improved. 


. Let g(-) be a function satisfying g(x) > 0 for x > 0, g(x) increasing for x > 0, 


and E|g(X)| < oo. Show that 


Eg(iX\) 


P{|X| > e} < 2(6) 


for every € > 0. 


Let X be an RV with EX = 0, var(X) = 07, and EX* = 4. Let K be any 
positive real number. Show that 
1 if K? <1, 
P{|X| > Ko} < a : if 1 < K?< ©, 
mie tg NSS 


In other words, show that bound (7) is better than bound (3) if K 2 > 4 /o* and 
worse if 1 < K? < y14/o%. Construct an example to show that the last inequalities 
cannot be improved. 


. Use Chebychev’s inequality to show that for any k > 1, e&+! > k?, 


7. For any RV X, show that 


P{X > 0} < inflg(@) :¢ > 0) <1, 


where g(t) = Ee'*, 0 < y(t) < co. 


. Let X be an RV such that P(a < X < b) = 1 where —c0o <a <b < oo. Show 


that var(X) < (b — a)*/4. 


CHAPTER 4 


Multiple Random Variables 


4.1 INTRODUCTION 


In many experiments an observation is expressible, not as a single numerical quan- 
tity, but as a family of several separate numerical quantities. For example, if a pair of 
distinguishable dice is tossed, the outcome is a pair (x, y), where x denotes the face 
value on the first die, and y, the face value on the second die. Similarly, to record 
the height and weight of every person in a certain community, we need a pair (x, y), 
where the components represent, respectively, the height and the weight of a partic- 
ular person. To be able to describe such experiments mathematically, we must study 
multidimensional random variables. 

In Section 4.2 we introduce the basic notations involved and study joint, marginal, 
and conditional distributions. In Section 4.3 we examine independent random vari- 
ables and investigate some consequences of independence. Section 4.4 deals with 
functions of several random variables and their induced distributions. In Section 4.5 
we consider moments, covariance, and correlation, and in Section 4.6 we study con- 
ditional expectation. The last section deals with ordered observations. 


4.2 MULTIPLE RANDOM VARIABLES 


In this section we study multidimensional RVs. Let (Q, S, P) be a fixed but other- 
wise arbitrary probability space. 


Definition 1. The collection X = (Xj, X2,... , Xn) defined on (Q,S, P) into 
Ry by 


X(o) = (X\(), X2(@),... » Xn(@)), WE 2, 
is called an n-dimensional RV if the inverse image of every n-dimensional interval 


I = {(x1,%2,..., 2%): —00 < x; <aj,a; €ER,i=1,2,...,n} 
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is also in S, that is, if 
X7!C) = {w: X1(@) < aj,...,Xn(w) < an} ES fora; € R. 


Theorem 1. Let X), X2,..., Xn be n RVs on (Q,S, P). Then X = (X}, X2, 
..., Xn) is an n-dimensional RV on (2, S, P). 


Proof. Let I = {(%1,x2,...,%n): —00 < xj < aj, i= 1,2,...,n}. Then 
((X1, X2,..., Xn) € 1} = fw: X1@) < ay, X2@) < a2,... , Xn(@) < an} 


= ( \fo: Xo) < a} € S, 


~ 
W 


as asserted. 

From now on we restrict attention to two-dimensional random variables. The dis- 
cussion for the n-dimensional (n > 2) case is similar except when indicated. The 
development follows closely the one-dimensional case. 

Definition 2. The function F(., -), defined by 
(1) F(x, y) = P{X <x,Y < y}, all (x, y) € Ro, 
is known as the DF of the RV (X, Y). 


Following the discussion in Section 2.3, it is easily shown that 


(i) F(x, y) is nondecreasing and continuous from the right with respect to each 
coordinate, and 


(ii) lim F(x, y) = F(+00, +00) = 1, 
X-*+00 


y> +00 


lim F(x, y) = F(x, —co) =0 for all x, 
y> 00 
lim F(x, y) = F(—oo, y) =0 for all y. 
X7>-0O 
But (i) and (ii) are not sufficient conditions to make any function F(-, -) a DF. 
Example 1. Let F be a function (Fig. 1) of two variables defined by 


0, x<Oorx+y<lory <0, 
1, otherwise. 


ra =| 


Then F satisfies both (i) and (ii) above. However, F is not a DF since 
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F(xy)=1 


Fig. 1. 


P{h<xX<i4<¥Y<1}=Fd,1)+ F(4,4)— F(1, 4) — F(Z 1) 
=1+0-1-1=-170. 


Let x; < x2 and y; < y2. We have 


P{xy < X <x2,y1 < Y < y2} 
= P{X <x2,Y < y2}+ P(X <u,Y¥ < yi} 
— P{X <x, ¥ < yo} — P{X <x2,¥ < yi} 
= F(x2, y2) + FQ, yi) — Fi, y2) — FQ, yi) 
>0 


for all pairs (x1, y1), (x2, y2) with x1 < x2, y1 < ya, (see Fig. 2). 


Theorem 2. A function F of two variables is a DF of some two-dimensional RV 
if and only if it satisfies the following conditions: 


(i) F is nondecreasing and right continuous with respect to both arguments. 
(ii) F(—ox, y) = F(x, —00) = O and F(+-00, +00) = I. 
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(X4,¥2) (X22) 


444) (X2.V1) 


0 x 
Fig. 2. {x1 < x < x2, yy < y < yo}. 
(iii) For every (11, yi), (%2, y2) with x} < x2 and yj < y2 the inequality 
(2) F (x2, y2) — F (x2, y1) + F(t, 91) — FQ, y2) = 0 
holds. 


The “if” part of the theorem has already been established. The “only if” part will 
not be proved here (see Tucker [113, p. 26). 
Theorem 2 can be generalized to the n-dimensional case in the following manner. 


Theorem 3. A function F(x, x2, ... , Xn) is the joint DF of some n-dimensional 


RV if and only if F is nondecreasing and continuous from the right with respect to 
al] the arguments x}, x2, ... , X, and satisfies the following conditions: 


(i) F(—00, x2,.-- Xn) = F(x1, —00, X3,... Xn) +°- 
= F(x1,... :Xn—1, —00) = 0, 
F (+00, +00,... , +00) = 1. 


(ii) For every (x1, X2,...,%n) € Rp, and all e; > OG = 1,2,... ,n), the in- 
equality 


(3) F(x + 61,%2 + €2,...,%n + €n) 


n 
~ So Fe HEL, - ++ Xi-1 HF Ei-1, Xi, Kind + i415 --- Xn + En) 
i=! 
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n 
+ y: F(x) +€1,--- ,Xi-1 + €)-1, 44, 4141 + Fi41,---, 
i,j=} 
i<j 
Xj-1 + Ej-1,%j, Xft + jt, +. Xn + En) 
+-:: 


+(—1)" F(xy, x2, see >Xn) = 0 
holds. 


We restrict ourselves here to two-dimensional RVs of the discrete or continuous 
type, which we now define. 


Definition 3. A two-dimensional (or bivariate) RV (X, Y) is said to be of the 
discrete type if it takes on pairs of values belonging to a countable set of pairs A with 
probability 1. We call every pair (x;, y;) that is assumed with positive probability 
pij ajump point of the DF of (X, Y), and call p;; the jump at (x:, y;). Here A is the 
support of the distribution of (X, Y). 


Clearly, Li Pij = 1. As for the DF of (X, Y), we have 
F(x, y) = > pay, 
B 


where B = {(i, j): xi <x, yj < y}. 


Definition 4. Let (X, Y) be an RV of the discrete type that takes on pairs of values 
(xj, yj), i= 1,2,... and j = 1,2,... . We call 


Dij = P{X =x, ¥ = yj}, Be Ve egy SSA Qyess 
the joint probability mass function (PMF) of (X, Y). 


Example 2. A die is rolled, and a coin is tossed independently. Let X be the face 
value on the die, and let Y = 0 if a tail turns up and Y = | if a head turns up. Then 


A= {(1,0), (2,0),... , (6,0), C1, 1), (2, 1),..., (6, D}, 


and 


1 
Pii= 75 fori =1,2,...,6; j=0,1. 


The DF of (X, Y) is given by 
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0, x<1,-co<y<o;-o<x<o,y <Q, 
db l<x<2,0<y<l, 
i, 2<x<3,0<y<1l1l<x<2,1<y, 
i, 3<x<4,0<y<l, 
4s 4<x<5,0<y<1;2<x<3,1<y, 
FEDS oF 5<x <6,0<y<l, 
5, 6<x,0<y<1;3<x<4,1<y, 
2, 4<x<5,1l<y, 
2, S<x<6,1<y, 
1, 6<x,I1<y. 
Theorem 4. A collection of nonnegative numbers {pjj: i = 1,2,...;j = 


1,2,...} satisfying ee pij = 1 is the PMF of some RV. 
The proof of Theorem 4 is easy to construct with the help of Theorem 2, 


Definition 5. A two-dimensional RV (X, Y) is said to be of the continuous type 
if there exists a nonnegative function f(., -) such that.for every pair (x, y) € R2 we 
have 


(4) Fane fi ie fu, v)do] du, 


where F is the DF of (X, Y). The function f is called the (joint) PDF of (X, Y). 
Clearly, 


x y 
F (+00, +00) = tim. f / ft, v)dudu 
--0O J—OO 


y>+00 


=| / fu, v)dudu = 1. 
—oo J —00 


If f is continuous at (x, y), then 


82 F(x, y) 


©) Oxdy 


= f(, y). 
Example 3. Let (X, Y) be an RV with joint PDF (Fig. 3) given by 


ety), O0<x<0, O0<y<wm, 


fQ@,y= | 


0, otherwise. 


108 MULTIPLE RANDOM VARIABLES 


Fig. 3. f(x, y) = exp[-@ + y)], x > 0, y>0. 


Then 


(i-—e-*)(l—e7), O<x<a, O0<y<o, 
0, otherwise. 


re=| 


Theorem 5. If f is a nonnegative function satisfying f bi f oe fQ, y)dxdy = 
1, then f is the joint density function of some RV. 


Proof. For the proof, define 


Frans f | : fu) du du 


—0O —00 


and use Theorem 2. 
Let (X, Y) be a two-dimensional RV with PMF 
pij = P{X =x, ¥Y = yj}. 


Then 


6) > py =) PIX =u, ¥ = yj} = PLY = yj} 


i=1 i=l 


and 
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co co 
(7) >> pij = D> PIX = 4), ¥ = yj) = PIX = xj}. 
j=l j=l 
Let us write 
oO oO 
(8) P=) py and py = > Py. 
j=l i=l 


Then pj. > 0 and pee Pi. = 1, p.j > 0 and aye pj = 1, and {p;.}, (p.;} 
represent PMFs. 


Definition 6. The collection of numbers {p;.} is called the marginal PMF of X, 
and the collection {p.;}, the marginal PMF of Y. 


Example 4. A fair coin is tossed three times. Let X = number of heads in three 
tossings, and Y = difference, in absolute value, between number of heads and num- 
ber of tails. The joint PMF of (X, Y) is given in the following table: 


The marginal PMF of Y is shown in the column representing row totals, and the 
marginal PMF of X, in the row representing column totals. 


If (X, Y) is an RV of the continuous type with PDF f, then 


(9) Awe / (ene 
and 
(10) Hore / flay) ae 


satisfy f)(x) > 0, fo(y) > 0, and j dae fix) dx = 1, | ee fro(y) dy = 1. It follows 
that f;(x) and fo(y) are PDFs. 


Definition 7. The functions fj (x) and fo(y), defined in (9) and (10), are called 
the marginal PDF of X and the marginal PDF of Y, respectively. 


Example 5. Let (X, Y) be jointly distributed with PDF f(x, y) = 2,0 <x < 
y < 1, and = 0 otherwise (Fig. 4). Then 
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f(x,y) =2 


> 
0 1 x 


Fig. 4. f(x,y) =2,0<x<y<l. 


] 
— 2x, : 
joo f aay = {7 x O<x<1 
i 


0, otherwise 


and 


y 1 
fo = | das = | O<y<1l, 
0 


0, otherwise 
are the two marginal density functions. 


Definition 8. Let (X, Y) be an RV with DF F. Then the marginal DF of X is 
defined by 


(11) F\(x) = F(x, 00) = lim F(x, y) 
yoo 


7 pe Di- if (X, Y) is discrete, 
~ | fo fi@dt — if (X, Y) is continuous. 


A similar definition is given for the marginal DF of Y. 


In general, given a DF F(x), x2,... , Xn) of an n-dimensional RV (X1, X2,..., 
Xn), one can obtain any k-dimensional (1 < k < n — 1) marginal DF from it. Thus 
the marginal DF of (X;,, Xj,,.-.Xi,), where 1 < iy < ig <--- < ix <n, is given 
by 
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lim F(x, X2,...,Xn) 
> OO 
ifiy,i2,... ig 


= F(+00,... , +00, Xj,, $OO,... , FO0,... , Xz, $O0,... , +00). 


We now consider the concept of conditional distributions. Let (X, Y) be an RV 
of the discrete type with PMF p;j = P{X = x;,Y = yj}. The marginal PMFs 
are pj. = S72, and p.j = Do72) pij. Recall that if A,B € S and PB > 0, the 
conditional probability of A, given B, is defined by 


P(AB) 


P{A | OS PB 


Take A = {X = xij} = {(2i,y): -0o < y < oo} and B = {Y = yj} = 
{(x, yj); -00 < x < oo}, and assume that PB = P{Y = yj} = pj > 0. Then 
ANB ={X =x;,Y = y;}, and 

Pij 

P{A| B}= P(X =x; |Y=yj}=—. 

Pj 
For fixed j, the function P{X = x; | Y = yj} => Oand °°, P(X =x, | ¥ = 
yj} = 1. Thus P{X = x; | ¥ = yj}, for fixed 7, defines a PMF. 


Definition 9. Let (X, Y) be an’RV of the discrete type. If P{Y = y;} > 0, the 
function 
P(X =x;, Y= yj} 


(12) TN Gas aay 
~ Si 


for fixed j is known as the conditional PMF of X, given Y = y;. A similar definition 
is given for P{Y = y; | X = x;}, the conditional PMF of Y, given X = x;, provided 
that P{X = x;} > 0. 


Example 6. For the joint PMF of Example 4, we have for Y = 1, 


P(X =i|Y=l}= 2 PEW: 
Pe i=1,2. 
Similarly, 
1 ce 
PixstivesaiT MeO 
0, iff =1,2, 


0, iff=t, 


{ Jj | =| if | =3, 


and so on. 


112 MULTIPLE RANDOM VARIABLES 


Next suppose that (X, Y) is an RV of the continuous type with joint PDF f. Since 
P{X =x} =0, P{Y = y} = 0 for any x, y, the probability P{X <x |Y = y}, 
or P{Y < y | X = x}, is not defined. Let « > 0, and suppose that P{y—e < Y < 
y +6} > 0. For every x and every interval (y — e, y + e], consider the conditional 
probability of the event {X < x}, given that Y € (y — €, y + €]. We have 


P(X <x,y-e<Y<y+e} 


P{X <x|y-e<¥<yt+ef= PiVeG ny tal 


For any fixed interval (y —¢, y +e], the expression above defines the conditional DF 
of X given that Y € (y — ¢, y + €], provided that P{Y € (y — €, y + &]} > 0. We 
shall be interested in the case where the limit 


lim P{X <x|Yée€QW-—e,yte]} 
6 04+- 


exists. 


Definition 10. The conditional DF of an RV X, given Y = y, is defined as the 
limit 


(13) lim P{X <x|YeQ-—e,y+e}}, 
670+ 


provided that the limit exists. If the limit exists, we denote it by Fx,y (x|y), and define 
the conditional density function of X, given Y = y, fxjy(ly), as a nonnegative 
function satisfying 


(14) Fx (tly) = J fur(lydt — forallx eR. 


For fixed y we see that fxjy(x|y) > O and ; ee fxyy(xly)dx = 1. Thus 
Fx\y (xly) is a PDF for fixed y. 

Suppose that (X, Y) is an RV of the continuous type with PDF f. At every point 
(x, y) where f is continuous and the marginal PDF f2(y) > 0 and is continuous, we 
have 


lim P{X <x,YeE(y—e,yt+el}} 
a ee eee 
60+ P{Ye (y—é,y+e]}} 
ae He ACP v) dv] du 
im = ——-——____—___-—_-. 


+e 
e>0+ ae f2(v) dv 


Fyyy (xy) 


Dividing numerator and denominator by 2¢ and passing to the limit as e — 0+, we 
have 
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Poo FU, y) du 
fry) 


=f [| du. 
-oLl f2(y) 


It follows that there exists a conditional PDF of X, given Y = y, that is expressed by 


Fyyy@ ly)= 


f(x,y) 


: 0. 
Ao) f2)) > 


fxy@ly)= 


We have thus proved the following theorem. 


Theorem 6. Let f be the PDF of an RV (X, Y) of the continuous type, and let 
Jz be the marginal PDF of Y. At every point (x, y) at which f is continuous and 
f2(y) > 0 and is continuous, the conditional PDF of X, given Y = y, exists and is 
expressed by 


f@,y) 


1 = : 
(15) Fxiy@ | y) fo) 


Note that 
f. f(u, y)du = foly) Fx | y), 
so that 
a a Ne 1 yan | a= cs fry) Fxiy @ | y) dy, 


where F; is the marginal DF of X. 


It is clear that similar definitions may be made for the conditional DF and condi- 
tional PDF of the RV Y, given X = x, and an analog of Theorem 6 holds. 

In the general case, let (X1, X2,... , X,) be an n-dimensional RV of the continu- 
ous type with PDF fx, ,x5,...,X,(X1) X2) +.» 5 Xn). Also, let {i) < iz <--- < ix, ji < 
Jz <---+ < jy} bea subset of {1,2,... ,n}. Then 


F (Xi, s Xign ee a Mig |X jp Xjyr e+ 5 Xf) 
Xi Xin k 
_ Bag fee IX ig Xig eX jy oo Xj Uns see Ui Xjyr- es Xj) pai du;, 
TiO Gk a ee ee ee 
Pong hk FR poe XigsX jy nn Xp Ups «+ a UipsXjyree- xX jz) Tp=1 dui, 
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provided that the denominator exceeds 0. Here Ix, En es, Oe is the joint 
marginal PDF of (X;,, Xi,,... , Xi,, Xj,, Xjo,..- » Xj). The conditional densities 
are obtained in a similar manner. 

The case in which (X1, X2,... , X,) is of the discrete type is treated similarly. 


Example 7. For the joint PDF of Example 5, we have 


_f@y)_ 1 
frxQ |) = soe ae, x<y<l, 


so that the conditional PDF fy; is uniform on (x, 1). Also, 


1 
fxiy@ ly) =-, O0<x<y, 


which is uniform on (0, y). Thus 


1 
Piy>six=h}= [pay 

1 
P(xX>3ly=9}= 


We conclude this section with a discussion of a technique called truncation. We 
consider two types of truncation, each with a different objective. In probabilistic 
modeling we use truncated distributions when sampling from an incomplete popu- 
lation. 


Definition 11. Let X be an RV on (2,5, P), and T € % such thatO < P{X € 
T} < 1. Then the conditional distribution P{X < x | X € T}, defined for any real 
x, is called the truncated distribution of X. 


If X is a discrete RV with PMF p; = P{X = x;}, i = 1,2,..., the truncated 
distribution of X is given by 


Bi 
P(X =x;,,X eT = ifx; € T, 

(18) P(X =x;|XeT}= Ae eet =} Ver Pi 
{x eT} 0 otherwise. 


If X is of the continuous type with PDF f, then 


P{IX<x,X€T} — Secoxinr SAY 


MO) Sl SE ary ea = Sr f(y) dy 
T 


The PDF of the truncated distribution is given by 
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f(x) 
(20) h(x) = 4 fr fy) dy’ 
0, xT. 


xeéT, 


Here T is not necessarily a bounded set of real numbers. If we write Y for the RV 
with distribution function P{X <x | X € T}, then Y has support 7. 


Example 8. Let X be an RV with standard normal PDF 


—~ 2/2 
er,” 


1 
IO = 


Let T = (—ov, O}. Then P{X € T} = 5, since X is symmetric and continuous. For 
the truncated PDF, we have 


2f(), —oo <x <0, 


h = 
”) 0, x>0. 


Some other examples are the truncated Poisson distribution 


A k 
é€ x 
sie oa eal eae oe Ree 1 25 


where T = {X > 1}, and the truncated uniform distribution 
j : 
f@= rt O<x<é6@, and = 0 otherwise, 


where T = {X < 6},0 > 0. 


The second type of truncation is very useful in probability limit theory, especially 
when the DF F in question does not have a finite mean. Let a < b be finite real 
numbers. Define the RV X* by 


x*= x ifa<X<b 
~ Jo ifX <a or X>b. 


This method produces an RV for which P{a < X* < b} = 1 so that X* has moments 
of all orders. The special case when b = c > 0 anda = —c is quite useful in 
probability limit theory when we wish to approximate X through bounded RVs. We 
say that X° is X truncated at c if X° = X for |X| < c, and = 0 for |X| > c. Then 
E|X°|* < c*. Moreover, 


P{X # X°} = P{|X| > c}, 
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so that c can be selected sufficiently large to make P{|X] > c} arbitrarily small. For 
example, if E|X|* < 00, then 


E|x|? 


P{iX|>c}< —>—, 
c 


and given € > 0, we can choose c such that E|X|? /c? <6. 
The distribution of X° is no longer the truncated distribution P{X < x | |X| < c}. 
In fact, 


0, ys-c, 

Fey) = F(y) — F(-c), —c<y <0, 
1—-F(c)+ FQ), O<y<ce, 
1, y>e, 


where F is the DF of X and F° is that of X°. 
A third type of truncation, sometimes called Winserization, sets 


X*=X ifa<X<b, =a ifX<a, and =b ifX>b. 


This method also produces an RV for which P(a < X* < b) = 1, moments of all 
orders for X* exist, but its DF is given by 


F*(y)=0 fory<a, =F(y) fora<y<b, =1 fory>b. 


PROBLEMS 4.2 


1. Let F(x, y) = 1ifx+2y > 1, and = Oif x +2y < 1. Does F define a DF in 
the plane? 


2. Let T be a closed triangle in the plane with vertices (0,0), (0, nf 3), and 
(/2, /2). Let F(x, y) denote the elementary area of the intersection of T 
with {(x1, x2): xy < x,x2 < y}. Show that F defines a DF in the plane, and 
find its marginal DFs. 

3. Let (X, Y) have the joint PDF f defined by f(x, y) = 5 inside the square with 
corners at the points (1, 0), (0, 1), (—1, 0), and (0, —1) in the (x, y)-plane, and 
= 0 otherwise. Find the marginal PDFs of X and Y and the two conditional 
PDFs. 


4. Let f(x, y,z) =e * %*,x > 0,y > 0, z > 0, and = 0 otherwise, be the joint 
PDF of (X, Y, Z). Compute P{X < Y < Z} and P{X =Y < Z}. 


5. Let (X, Y) have the joint PDF f(x, y) = tIxy + (x7/2)]if0 <x <1,0< 
y < 2, and = 0 otherwise. Find P{Y <1|X < 5}. 
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6. For DFs F, F|, Fo,... , F, show that 


n 
1— SC = FG) < Fe, x2,--..%n) < min F(x) 
l<i<n 


i=l 
for all real numbers x1, x2, ... , Xn if and only if F;’s are marginal DFs of F. 


7. For the bivariate negative binomial distribution 


(x+y+k—1)! y k 
P{X=x,Y=y}= “sinhe= i te — Pi — pr)”, 
where x,y = 0,1,2,...,k > 1 is an integer, 0 < p; < 1,0 < po < 1, 


and p; + p2 < 1, find the marginal PMFs of X and Y and the conditional 
distributions. 


In Problems 8 to 10, the bivariate distributions considered are not unique gener- 
alizations of the corresponding univariate distributions. 


8. For the bivariate Cauchy RV (X, Y) with PDF 
f[@ y= +e + yy, —o<x<00, -o<y<oo, c>0, 
find the marginal PDFs of X and Y. Find the conditional PDF of Y given X = x. 


9. For the bivariate beta RV (X, Y) with PDF 


(pi + p2+ p3) 
(pil (p20 (ps) 
x20, y>0, x+y<il, 


xPi-lyPely — x — yp], 


f@y= 


where p}, P2, P3 are positive real numbers, find the marginal PDFs of X and Y 
and the conditional PDFs. Find also the conditional PDF of Y/(1 — X), given 
X=x. 


10. For the bivariate gamma RV (X, Y) with PDF 
pot 
P@r(y) 


find the marginal PDFs of X and Y and the conditional PDFs. Also, find the 
conditional PDF of Y — X given X = x, and the conditional distribution of X/Y 
given Y = y. 


11. For the bivariate hypergeometric RV (X, Y) with PMF 


N\~!(N N N—Np, —N 
rikentere() CC Cae, 


x,y=0,1,2,...,n, 


f@, y= x? ley — x)’ !e-By O<x<y; a,B,y>9, 
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where x < Npj, y < Np2,n—x—y < N(i — pi — pz), N,n integers with 
n < N,andQ < pj < 1,0 < p2 < 1 so that py + p2 < 1, find the marginal 
PMFs of X and Y and the conditional PMFs. 


12. Let X be an RV with PDF f(x) = Lif 0 < x < 1, and = 0 otherwise. Let 
T = {x: <x< 5}. Find the PDF of the truncated distribution of X, its 
means, and its variance. 


13. Let X be an RV with PMF 


rol 
P(X =x}=e"—, x=0,1,2,...,A>0. 
x? 


Suppose that the value x = 0 cannot be observed. Find the PMF of the truncated 
RV, its mean, and its variance. 


14. Is the function 


exp(—u), O<x<y<z7<u<@ 


x, 3 : | = 
FO, y, 2) 0, elsewhere 


a joint density function? If so, find P(X < 7) where (X, Y, Z, U) is a random 
variable with density f. 
15. Show that the function defined by 


24 


(ieee yeesge Pee are Be 


[QY,27W = 


and zero elsewhere is a joint density function. 
(a) Find P(X > Y>Z>U). 
(b) Find P(X¥+¥+2Z+U > 1). 


16. Let (X, Y) have joint density function f and joint distribution function F. Sup- 
pose that 


f (x1, vi) f (X2, v2) < fr, ya) f 2, yi) 
holds for x; < a < x2 and y; < b < y2. Show that 
F(a, b) < Fi (a) F2(b). 


17. Suppose that (X, Y, Z) are jointly distributed with density 


x Z), x >0, >0, z>0 
Hepa g(x)g(y)g(z) y 
0 elsewhere. 


Find P(X > Y > Z). Hence find the probability that (x, y,z) ¢{X > Y > Z} 
or {X < Y < Z}. (Here g is a density function on R.) 
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We recall that the joint distribution of a multiple RV uniquely determines the 
marginal distributions of the component random variables, but in general, knowledge 
of marginal distributions is not enough to determine the joint distribution. Indeed, 
it is quite possible to have an infinite collection of joint densities fg with given 
marginal] densities. 


Example 1 (Gumbel [36]). Let fi, fo, £3 be three PDFs with corresponding DFs 
F\, F2, F3, and let a be a constant, |a| < 1. Define 


Fee(%1, X2, %3) = fir) fo(x2) fa (x3) 
{1+ a@[2F) (x1) — 1)[2Fo(x2) — 1)[2F3(x3) — 1}. 


We show that F, is a PDF for each @ in [—1, 1] and that the collection of densities 
{fai — 1 <a < 1} has the same marginal densities f,, fo, f3. First note that 


2F 1 0e1) — 12 Fe(x2) — U2 P33) — W] < 1, 
so that 
1+ a[2Fi (x1) — 1][2Fo(x2) — [2 P33) — 1] = 0. 


Also, 


i For (X1, X2, %3) dx1 dx2 dx3 
=I+a (fore as 11filssya) ( fer ~— 1] f2o(x2) axa) 


. (fore — I1fales) da) 


= b+ a{[FP Or) — WERE (x2)|, — VILF 33) |" — 1} 
= 1. 


It follows that fy is a density function. That fi, f2, 3 are the marginal densities of 
Soa follows similarly. 


In this section we deal with a very special class of distributions in which the 
marginal distributions uniquely determine the joint distribution of a multiple RV. 
First we consider the bivariate case. 

Let F(x, y) and F(x), F2(y), respectively, be the joint DF of (X, Y) and the 
marginal DFs of X and Y. 


Definition 1. We say that X and Y are independent if and only if 
() F(x, y)= Fi(x)Fo(y) forall (x, y) € Ro. 
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Lemma 1. If X and Y are independent and a. < c, b < d are real numbers, then 
(2) Pla<X <c,b<Y <d}=P{a< X <c}P{b<Y <d}. 
Theorem 1 


(a) A necessary and sufficient condition for RVs X, Y of the discrete type to be 
independent is that 


(3) P{X =xj,Y = yj} = P{X =x} P{Y = yj} 


for all pairs (x;, yj). 
(b) Two RVs X and Y, of the continuous type are independent if and only if 


(4) fx y= fAMAy) for all (x, y) € Ro, 


where f, fi, f2, respectively, are the joint and marginal densities of X and Y, 
and f is everywhere continuous. 


Proof. (a) Let X,Y be independent. Then from Lemma 1, letting a — c and 
b — d, we get 


P{X =c, Y¥ =d} = P{X =c}P{¥ = d}. 


Conversely, 

F(x,y)= >) P{X =4xi,¥ = yy}, 

B 
where 
B= {G, f): Xi < x, Yj < y}. 
Then 
F(x, y) = >> P(X =x) PLY = yj} 
B 
=) bs PIY = va] P(X = x;} = F(x)F(). 
XjS* LyjSy 


The proof of part (b) is left as an exercise. 


Corollary. Let X and Y be independent RVs; then Fy|x(y | x) = Fy(y) for all 
y, and Fyy(x | y) = Fx(x) for all x. 


Theorem 1. The RVs X and Y are independent if and only if 
(5) P{X € Aj, Y € Ao} = P{X € Aj} P{Y € Ag} 


for all Borel sets A, on the x-axis and A on the y-axis. 
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Theorem 2. Let X and Y be independent RVs and f and g be Borei-measurable 
functions. Then f(X) and g(Y) are also independent. 


Proof. We have 


P{f(X) <x, 8(Y) < y} = PIX € f7'(-00, x], Y € g '(—00, y}} 
= P{X € f-'(—00, x}} PY € 3” !(—00, yl} 
= P{f(X) < x} Pig(Y) < y}. 
Note that a degenerate RV is independent of any RV. 
Example 2. Let X and Y be jointly distributed with PDF 
l+xy 


IQ, y) = 4 : 
0, otherwise. 


Ix] < I, lyl <1, 


Then X and Y are not independent since f{(x) = 5s jx} < 1, and foty) = 5 
{y| < 1, are the marginal densities of X and Y, respectively. However, the RVs X 
and Y? are independent. Indeed, 


. 


yl/2 wl/2 
PX? su.¥? sv) = f i f(x, y)dx dy 
yl fyi 


1 pi/2 ul? 

=3/ i (1 + xy) dx | dy 
4 Jeyi2 | J—4i2 

= yl2yl2 


= P{X? <u} P{Y? < v}. 


Note that @(X 2) and w(Y 2) are independent where ¢ and y are Borel-measurable 
functions. But X is not a Borel-measurable function of X?. 


Example 3. We return to Buffon’s needle problem, discussed in Examples 1.2.9 
and 1.3.7. Suppose that the RV R, which represents the distance from the center of 
the needle to the nearest line, is uniformly distributed on (0, /]. Suppose further that 
@, the angle that the needle forms with this line, is distributed uniformly on [0, 7). 
If R and © are assumed to be independent, the joint PDF is given by 


1 1 
+ fO<r<l, O<a, 
lon 

0 otherwise. 


fro, 9) = fr) fo) = 


The needle will intersect the nearest line if and only if 
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ts O>R 
— Sin 7 
5 = 


Therefore, the required probability is given by 


2R x pl/2)sind 
P {sino > FI = f I fr.a(r,6) dr d0 
0 


aa 1 
= =| ~ sind dé = —. 
ix 0 2 x8 
Definition 2. A collection of jointly distributed RVs X1, X2,..., Xn is said to 
be mutually or completely independent if and only if 


n 
(6) F(aiyx2,.--- am) =] ] Gi) forall (1, x2,-.. .4n) € Rn, 
i=l 


where F is the joint DF of (X\, X2,..., Xn), and F;@i = 1,2,...,m) is the 
marginal DF of X;. Xi,... , Xn are said to be pairwise independent if and only if 
every pair of them are independent. 


It is clear that an analog of Theorem 1 holds, but we leave it to the reader to 
construct it. 


Example 4. In Example 1 we cannot write 


Fu (x1, 2,43) = fir) fo(x2) fa (x3) 


except when a = 0. It follows that X;, X2, and X3 are not independent except when 
a = 0. 


The following result is easy to prove. 


Theorem 3. If X1, X2,..., Xn are independent, every subcollection X;,, X;,, 
..., Xi, OF X1, X2,... , Xn is also independent. 


Remark 1. It is quite possible for RVs X1, X2, ... , Xn to be pairwise indepen- 
dent without being mutually independent. Let (X, Y, Z) have the joint PMF defined 
by 


2 itt. y.2) € (@,0,0), 0,1, 0, 
(1,0, 1), CL, 1, 0}, 


ve if (x, y,z) € {(0, 9, 1), (0, 1, 9), 
(1,0,0), (1,1, 1}. 


P{X=x,Y=y,Z=z= 
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Clearly, X, Y, Z are not independent. (Why?) We have 


P{X=x,Y=y}=4, (x,y) € {,0), 0,1, (1,0), 0, D), 
PY=y,Z=2=1, (D9 €{0,0), 0,1, (1,0), 0, D}, 
P{X=x,Z=z)=4, (x, 2) € {0, 0), ©, 1), (1, 0), (I, D}, 
P(X=x}=4, 2x=0,x=1, 
P{iY=yj=3, y=0,y=1, 


and 
P{Z=z}=3, z2=0,z=1. 
It follows that X and Y, Y and Z, and X and Z are pairwise independent. 


Definition 3. A sequence {X,,} of RVs is said to be independent if for every n = 
2, 3,4,... the RVs X;, X2,... , X, are independent. 


Similarly, one can speak of an independent family of RVs. 


Definition 4. We say that RVs X and Y are identically distributed if X and Y 
have the same DF, that is, 


Fx (x) = Fy(x) for all x € R, 
where Fy and Fy are the DFs of X and Y, respectively. 

Definition 5. We say that {X,,} is a sequence of independent, identically dis- 
tributed (iid) RVs with common law C(X) if {X,} is an independent sequence of 
RVs and the distribution of X,,(n = 1,2,...) is the same as that of X. 

According to Definition 4, X and Y are identically distributed if and only if they 
have the same distribution. It does not follow that X = Y with probability 1 (see 
Problem 7). If P{X = Y} = 1, we say that X and Y are equivalent RVs. Al! Defini- 
tion 4 says is that X and Y are identically distributed if and only if 

P{X € A} = P{Y € A} forall A € B. 
Nothing is said about the equality of events {X € A} and {Y € A}. 


Definition 6. Two multiple RVs (X,, X2,... , Xm) and (V1, Y2,... , Yn) are said 
to be independent if 


(7) F(x, %2,... >Xms V1, Y2,--- > Yn) = Fy (x1, x2,-.- »Xm)F2(y1, ya, --- > Yn) 
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for all (01, %2,..- Xm, Yi, Y2.--->¥n) € Rmin, where F, Fj, F2 are the joint 
distribution functions of (X,, X2,...,Xm,Y¥t, Yo,-....¥n), (X1, X2,...,Xm), 
and (Yj, ¥2,..., Yn), respectively. 


Of course, the independence of X = (Xj, X2,.-. , Xm) and Y = (Yj, Yo,... , Yn) 
does not imply the independence of components X1, X2,..., Xm of X or compo- 
nents Y;, Yo,..., Y, of Y. 


Theorem 4. Let X = (X}, X2,..., Xm) and ¥Y = (11, Y2,... , Yn) be indepen- 
dent RVs. Then the component X; of X(j = 1, 2,... , 7m) and the component Y;, of 
Y(k = 1,2,... ,n) are independent RVs. If A and g are Borel-measurable functions, 
h(X,, Xo,..., Xm) and g(Y), Yo,..- , ¥,) are independent. 


Remark 2. It is possible that an RV X may be independent of Y and also of 
Z, but X may not be independent of the random vector (Y, Z). See the example in 


Remark 1. 


Let X;, X2,... , Xn be independent and identically distributed RVs with common 
DF F. Then the joint DF G of (X), X2,... , Xn) is given by 


n 
G(x1,425.- in) = |] FO). 
j=l 


We note that for any of the n! permutations (x;,, Xj.,... , Xi,) Of (%1,.%2,... Xn) 


n 
G1, x2, --- tn) = [| FG) = Gi, tis --- Xp) 
j=1 


so that G is a symmetric function of x1, x2,... ,X%n. Thus (X1, X2,..., Xn) = 
(Xj,, Xin,--. , Xi,), where X 2 Y means that X and Y are identically distributed 
RVs. 

Definition 7. The RVs X,, X2,... , Xn are said to be exchangeable if 


(XtXo. 00. ¥n) S (Kas Xia e-s + Ki) 


for all n! permutations (i1,i2,...,in) of (1,2,...,#). The RVs in the sequence 
{X,,} are said to be exchangeable if X;, X2,... , X, are exchangeable for each n. 


Clearly if X,, X2,..., Xn are exchangeable, then X; are identically distributed 
but not necessarily independent. 
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Example 5. Suppose that X, Y, Z have joint PDF 


F(x +y +2), O<x<1,0<y<10<z<I, 
0, otherwise. 


four2=| 


Then X, Y, Z are exchangeable but not independent. 


Example 6. Let X1, X2,... , Xn be iid RVs. Let S, = Lint Kj = 1 2p ces 
and Y, = X,—Sy,/n,k = 1,2,...,n—1. Then Y, Yo,... , Yn—1 are exchangeable. 


Theorem 5. Let X, Y be exchangeable RVs. Then X — Y has a symmetric dis- 
tribution. 


The proof is simple. 


Definition 8. Let X be an RV, and let X’ be an RV that is independent of X and 
x’ £ X. We call the RV 


X°=X—-X’ 
the symmetrized X. 
In view of Theorem 5, X* is symmetric about zero so that 
P{X'>0}>4 and P{X' <0}> 5. 


If E|X| < oo, then E|X*| < 2E|X| < co, and EX’ = 0. 
The technique of symmetrization is an important tool in the study of probability 
limit theorems. We will need the following result later. The proof is left to the reader. 


Theorem 6. For ¢ > 0, 


(a) P{|X*| > e} < 2P{|X| > €/2}. 
(b) If a > O such that P{X > a} < 1 — pand P{X < —a} < 1 — p, then 


P{|X*| > €} = P(|X|>a+te} 


fore > 0. 


PROBLEMS 4.3 


1. Let A be a set of k numbers and Q be the set of all ordered samples of size n 
from A with replacement. Also, let S be the set of all subsets of 2 and P be a 
probability defined on S. Let X), X2,... , X, be RVs defined on (Q, S, P) by 
setting 


Xj (a1, 42,... , An) = Gj G@ =1,2,...,n). 
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Show that X;, X2,... , X», are independent if and only if each sample point is 
equally likely. 

2. Let X1, X2 be iid RVs with common PMF 
P(X =+1) = 5. 
Write X3 = X1X2. Show that X,, X2, X3 are pairwise independent but not 
independent. 
3. Let (X1, X2, X3) be an RV with joint PMF 


f (x1, 42,3) = if (x1,x2,%3) € A, 


i 
4 
=0 otherwise, 
where 
A = {(1, 0, 0), (0, 1,0), (0, 0, 1), G, 1, 1}. 
Are X;, X2, X3 independent? Are X,, X2, X3 pairwise independent? Are X; + 
X2 and X3 independent? 


4. Let X and Y be independent RVs such that XY is degenerate at c 4 0. That is, 
P(XY =c) = 1. Show that X and Y are also degenerate. 


5. Let (Q, S, P) be a probability space and A, B € S. Define X and Y so that 
X(w) = I4(o), Y(@) = Ip(w) forallae Q. 


Show that X and Y are independent if and only if A and B are independent. 
6. Let X;, X2,..., Xn bea set of exchangeable RVs. Then 


2 k 
p(y) =e l<k<n. 


Xp +Xo+---+Xn n? 


7. Let X and Y be identically distributed. Construct an example to show that X and 
Y need not be equal; that is, P{X = Y} need not equal 1. 


8. Prove Lemma 1. 


9. Let X1, X2,... , Xn be RVs with joint PDF f, and let f; be the marginal PDF 
of X;(j = 1,2,... ,). Show that X;, X2,... , Xn are independent if and only 
if 


fei, %2,--. Xn) =P] fi@y) forall (x1, x2,-.. .¥n) © Rn. 
j=l 
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10. Suppose that two buses, A and B, operate on a route. A person arrives at a certain 
bus stop on this route at time 0. Let X and Y be the arrival times of buses A and 
B, respectively, at this bus stop. Suppose that X and Y are independent and have 
density functions given, respectively, by 


1 
AQ@=-, QO<x<a, and zero elsewhere, 
a 


and 
1 . 
hy) = B 0<y<hb, and zero otherwise. 


What is the probability that bus A will arrive before bus B? 


11. Consider two batteries, one of brand A and the other of brand B. Brand A bat- 


teries have a length of life with density function 


f= 3Ax? exp(—Ax3), x >0, and zero elsewhere 
whereas brand B batteries have a length of life with density function given by 
g(x) = 3yy? exp(—py?), y>0, and zero elsewhere. 


Brand A and brand B batteries operate independently and are put to a test. What 
is the probability that brand B battery will outlast brand A? In particular, what 
is the probability if A = 42? 


12. (a) Let (X, Y) have joint density f. Show that X and Y are independent if and 


only if for some constant k > 0 and nonnegative functions f; and fo, 


SQ, yY=kKAM AM 


for allx, ye R. 


(b) Let A = { fx (x) > 0}, B = { fy(y) > O}, and fx, fy are marginal densities 
of X and Y, respectively. Show that if X and Y are independent, then {f > 
Oj} = Ax B. 


13. If ¢ is the CF of X, show that the CF of X* is real and even. 


14, Let X,Y be jointly distributed with PDF f(x, y) = (1 — x3y)/4 for |x] < 1, 


ly| < 1, and = 0 otherwise. Show that X £ ¥ and that X — ¥ hasa symmetric 
distribution. 


4.4 FUNCTIONS OF SEVERAL RANDOM VARIABLES 


Let X1, X2,.-., Xn be RVs defined on a probability space (Q,S, P). In practice 
we deal with functions of X,, X2,..., Xn such as Xj + Xo, X, — Xo, X)X2, 
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min(X;,... , X,), and so on. Are these also RVs? If so, how do we compute their 
distribution given the joint distribution of X1, X2,... , Xn? 
What functions of (Xj, X2,... , X,) are RVs? 


Theorem 1. Let g: R, — R,» be a Borel-measurable function; that is, if B € 
Bn, then g—!(B) € By. If X = (X1, X2,... , Xn) is an n-dimensional RV (n > 1), 
then g(X) is an m-dimensional RV. 


Proof. For B € Bm, 
{g(X1, X2,...,Xn) € B} = ((X1, X2,-.., Xn) € 1 (B)}, 


and since g~!(B) € Bp, it follows that ((X1, X2,... , Xn) € g7!(B)} € S, which 
concludes the proof. 


In particular, if g: R, —> Rm is a continuous function, then g(X1, X2,..., Xn) 
is an RV. 

How do we compute the distribution of g(X1, X2,..., X,)? There are several 
ways to go about it. We first consider the method of distribution functions. Suppose 
that Y = g(X1,..., Xn) is real-valued, and let y € 7. Then 


PLY < y} = P(g(X1,.-.,Xn) < y) 


P(X; =x1,...,Xn = Xn) in the discrete case 
(Ox. Xn):8 1s And SY} 
FQ, --. Xn) dx -+-dxp in the continuous case 
{(xq,.-. ¥n):8(41,--- And Sy} 

where in the continuous case f is the joint PDF of (X1,... , Xn). 

In the continuous case we can obtain the PDF of ¥Y = g(X1,... , Xn) by differen- 
tiating the DF P{Y < y} with respect to y provided that Y is also of the continuous 
type. In the discrete case it is easier to compute P{g(X1,..., Xn) = y}. 


We take a few examples, 


Example 1. Consider the bivariate negative binomial distribution with PMF 


BOSE = IS eye ee eh ee , 


where x, y = 0,1,2,...;k > 1 is an integer; py, p2 € (0,1); and py + po < 1. 
Let us find the PMF of U = X + Y. We introduce an RV V = Y (see Remark 1 
below) so that u = x + y, v = y represents a one-to-one mapping of A = {(x, y): 
x,y = 0,1,2,...} onto the set B = {(u,v): v = 0,1,2,...,4; wu =0,1,2,...)} 
with inverse map x = u — v, y = v. It follows that the joint PMF of (U, V) is given 
by 
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(u+k—1)! ae 
P{U =u,V =v} = Goole! *py(1— pi— pr) for (u,v) € B, 


0 otherwise. 


The marginal PMF of U is given by 
(u+k =D! ~ pi = pr QS (4) vy 
Ss Dial 2 yjP1 Pe 


_ @+k=1)10 ~ pi ~ pat 


k— Dial (pi + p2)” 


k-1 
=("" oor + py" ~ pr— pa (u=0,1,2,...). 


Example 2. Let (X;, X2) have uniform distribution on the triangle {0 < x; < 
x2 < 1); that is, (X,, X2) has joint density function 


2, O< x1 <x2<1 


0, elsewhere. 


f (1, x2) = | 


Let Y = X; + X2. Then for y < 0, P(Y < y) = 0, and for y > 2, PY < y= 1. 
For 0 < y < 2, we have 


P(YY <y)= P(X, + X2<y)= I/ f (x1, X2) dx} dx2. 


Osx] <x7<1 
x1+x2<y 


There are two cases to consider according to whether 0 < y < lor! < y < 2 (Fig. 
la and b). In the former case, 


y/2 y~x1 y/2 y? 
pw sy= | (/ 2dx2) dx, =2 f (y — 2x1) dx; = — 
x,=0 XQ=X} 0 2 


and in the latter case, 


1 x 
PY sy)=1-P>y)=1- f is dds) drs 


x2=y/2 1=y—x2 
1 2 
—2 
ey ee een oes ee a 
y/2 2 
Hence the density function of Y is given by 
y, O<y<l, 


frQ) = {2-y, l<y<2, 
0, elsewhere. 
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X2 


(a) 0 yl2 1 x, 


(b) 0 y-1 ye 1 x; 
Fig. 1. @) (x1 tr <y0<a sm <1LO0<ysIbOimitu<y0<xy sm < 
I<y <2}. 


The method of distribution functions can also be used in the case when g takes 
values in R,,, 1 < m <n, but the integration becomes more involved. 


Example 3. Let X, be the time that a customer takes from getting in line at a 
service desk in a bank to completion of service, and let X2 be the time she waits in 
line before she reaches the service desk. Then X; > Xz2 and X, — X> is the service 
time of the customer. Suppose that the joint density of (X1, X2) is given by 
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Xo 


0 x 


Fig. 2. A = (x) + x2 < yi, X11) — 2X2 < yx, OS Xp <x} < CO}. 


ews, 0 < x2 <x, < 00, 


pean) =| 


0, elsewhere. 


Let Y; = X, + X2 and Y2 = X, — X2. Then the joint distribution of (¥1, Y2) is given 
by 


P(Yi < 1, Y2 < y2) =f [ fonsan dx2, 


where A = {(xy, x2): x1 + x2 < yi, x1 —X2 < y2, 0 < x2 < x1 < 00}, Clearly, 
xy + x2 > x1 — X2, so that the set A is as shown in Fig. 2. It follows that 


(y1—y2)/2 x2+y2 
B25 == i ( [ on ax) dx, 
x 


x2=0 1=2 


yi /2 yi-*x2 
+ / (/ e*! ax) dx2 
x2=(y1—y2)/2 \0 x) =x2 


(y1-y2)/2 
= i e 21 — e ) dx2 
9 


yif2 
2 / (e972 — e NF) dxy 
(yi -y2)/2 


132 MULTIPLE RANDOM VARIABLES 


=(1-e @)(i— e O1-92)/2) 
4 (e701 ~-92)/2 _ e y1/2y ~e@7l (ey /2 — eM! —¥2)/2) 


= 1 —e7 2 —2¢7"/2 4 Je“ Ot y2)/2 
Hence the joint density of Y;, Y2 is given by 


Le-Oity)/2, 0 < yp < yi < 00, 


fy." ’ y2) = 
. elsewhere. 


The marginal densities of Y;, Y2 are easily obtained as 


fy, (yp =e for y; > 0, and 0 elsewhere; 
and 
fy. (y2) = e 2/2) — e7 92/2) for y2 > 0, and 0 elsewhere. 
We next consider the method of transformations. Let (X1,..., Xn) be jointly 
distributed with continuous PDF f (x1, x2, ... ,%n), and let y = g(x), x2,... ,X%Xn) = 


(1, Y2.--- » Yn), where 
Yi = Bi(X1, X2,..- Xn); PH 20. 
be a mapping of R, to Ry». Then 


P{(¥1, Yo,..., Yn) © B) = P{(X1, X2,...,Xn) € 2 1(B)) 
= f (1, X2,.-- Xn) | | dxi, 
I as ia iy 


where g!(B) = {x = (x1, X2,...,4n) € Rn: B(x) € B}. Let us choose B to be the 
n-dimensional interval 


B = By = (01, 93)--- Yn)? —00<¥; S yi =1,2,... a}. 
Then the joint DF of Y is given by 


P{Y¥ € By} = Gy(y) = P{gi(X) < yi, g2(%) < y2,.-- , 8n(X) < yn) 
=| ity | fore xT] dxi, 


and (if Gy is absolutely continuous) the PDF of Y is given by 
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a" Gy(y) 


wy) = 
Oy; Oy2++- OYn 


at every continuity point y of w. Under certain conditions it is possible to write w in 
terms of f by making a change of variable in the multiple integral. 


Theorem 2. Let (X1, X2,..., Xn) be an n-dimensional RV of the continuous 
type with PDF f(x, x2,..-, Xn). 


(a) Let 
Y1 = 910%), ¥2,... 4. Xn), 
y2 = 22(%1, x2, 508, +Xn), 
Yn = 8n(%1,X2,--- Xn) 


be a one-to-one mapping of R,, into itself; that is, there exists the inverse 
transformation 


X1 =Ay(y1, y2,--- Yn), 42 =h2(yi,y2,--- Yn), oes 
Xn = hn(y1, Y2,--+ Yn) 
defined over the range of the transformation. 


(b) Assume that both the mapping and its inverse are continuous. 
(c) Assume that the partial derivatives 


exist and are continuous. 
(d) Assume that the Jacobian J of the inverse transformation 


xt Ox} ax, 

dy1  Ay2 OYn 

0x2 9x2 8x2 

pe FE Hn) ay ayn see oye 
O(y1,--- +n) 3 da 

OXn Ox, Xn 

dy) Ay2 ayn 


is different from zero for (y1, y2,... , Yn) in the range of the transformation. 
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Then (Y%, Y2,... , Yn) has a joint absolutely continuous DF with PDF given 
by 


(1) wy, y2,--- Yn) = IFO, -- Yn), Ani, - ++» Yn). 
Proof For (yj, y2,--. + Yn) € Rn, let 
B= {((¥j, ¥9.---¥,) € Rn: ~0O<y; < yi, F=1,2,...,n}. 
Then 
g'(B) = {x © Ry: B(x) € B) = ((01, 2,-.- Xn) B®) < yj, 7 =1,2,...,n) 
and 
Gy(y) = P{Y € B} = P(X eg '(B)} 


-f/ ae [ fer omderday dey 
g1(B) 


v Yn ) ’ >+0* 5 AN 
af ol” FRG oh ee 
—0o ~OO O(y1, y2,--- Yn) 


Result (1) now follows on differentiation of DF Gy. 


Remark I. {n actual applications we will not know the mapping from x, x2, 

. ,Xn tO y1, y2,--- +, ¥n completely, but one or more of the functions g; will be 
known. If only k, 1 < k <n, of the g;’s are known, we introduce arbitrarily n — 
k functions such that the conditions of the theorem are satisfied. To find the joint 
marginal density of these k variables, we simply integrate the w function over all the 
n — k variables that were introduced arbitrarily. 


Remark 2. An analog of Theorem 2.5.4 holds, which we state without proof. 

Let X = (X;, X2,..., Xn) be an RV of the continuous type with joint PDF f, 
and let y; = gi (xj, x2,...,4n), i = 1,2,...,n, be a mapping of R,, into itself. 
Suppose that for each y the transformation g has a finite number k = k(y) of inverses. 
Suppose further that 2,, can be partitioned into k disjomt sets Aj, A2,... , Ag, such 
that the transformation g from A;(i = 1,2,...,m) into R, is one-to-one with in- 
verse transformation 


xy =h1,()1, Y2,--- 5 Yn)s acne Xn = hn; (V1, Y2,--- + Yn), i=1,2,...,k. 


Suppose that the first partial derivatives are continuous and that each Jacobian 
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Ohi; Ohi; Ohi; 

ay ay Yn 

0h2; dh; 0h; 

Ji=| Ay dyr 
x dhyj Ohni wae Ohni 

dy, yz On 


is different from zero in the range of the transformation. Then the joint PDF of Y is 
given by 


k 
WY Y20-0 6 Ya) = DELFI 2000s Yds oes ni 1s Ys os Yn))- 
i=] 


Example 4. Let X,, X2, X3 be iid RVs with common exponential density func- 
tion 


e* if x > 0, 
x= 
Ff) i otherwise. 
Also, let 
Xi, +X2 X\ 
Y, = X) + X24+-X3, Yo = —————_—.. and 3 = ———. 
an Nae aia ao em oe a coe oS 


Then 


x1 = yiy2y3, x2 = yiy2—%x1 = yiy21 — y3), and 
x3 = yi — yiy2 = yi(1 — yz). 


The Jacobian of transformation is given by 


Y2Y3 Y1¥3 y1y2 
J=] yd—y3) wO-y3) —yy2 | = —yfye. 
Lay -yI 0 


Note that 0 < y; < 00,0 < y2 < 1, and0 < y3 < 1. Thus the joint PDF of 
Y1, Yo, Y3 is given by 


w(y1, y2, ¥3) = yee?! 
= (2y2)($y7e >"), 0<y <0, O<y, <1. 


It follows that Y;, Y2, and Y3 are independent. 
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Fig. 3. {0 < y; + y2 < 2,0 < yy — yo < 2}. 


Example 5. Let X;, Xz be independent RVs with common density given by 


po=|{' ifO0<x <1, 


0 otherwise. 


Let ¥; = X1 + X2, Yo = X; — X2. Then the Jacobian of the transformation is given 
by 


1 1 

2 2 1 
J= =~, 

ae g 

2 2 


and the joint density of Y,, Y2 (Fig. 3) is given by 


l + 2 
ftw = 55 (13) 5 (5) 


if0 < 25 <1, 0< 2 <1, 


1 ‘ 
3 if (yi, y2) € {0 < y1) + yo < 2,0 < yy — 2 < 2}. 


The marginal PDFs of Y; and Y>2 are given by 


3 492 = Ys O0<y <1, 


fro =4 f? 


Co yan = lyn T= <2, 


0, otherwise; 
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2 
hea tdyi =y2 +1, —-l<y <9, 


fn (y2) = | ads tdyj=1—-y,, O<y<I, 


0, otherwise. 
Example 6. Let X1, X2, X3 be iid RVs with common PDF 


aye 
ae oe —OO <x <oO. 


] 
f@)= oe 
Let Yj = (X1—X2)/V2, Yo = (Xj +X2—2X3)/V6, and Y3 = (X1+X2+X3)/V3. 
Then 


Yi y2 y3 
x= H+ H+ 
meee ae 
yi y2 y3 
X2 = eH + ~~ + Ss 
2 4 V3 
and 
ye NO 
J3 V3 
The Jacobian of transformation is given by 
1 1 1 
v2 ve V3 
j —1 1 1 ‘ 
“| v2 ve V3 }~ 
6 —-/2 1 
3 V3 


The joint PDF of X;, X2, X3 is given by 


1 x2 4x2 4+ x2 
B(x1, x2, x3) = Wamp? — ; X1,%2,x%3 E R. 


It is easily checked that 
xp+xp+xh = yh typ + ys, 
so that the joint PDF of Y,, Y2, Y3 is given by 


2 2 2 
w(y1, y2> y3) = ae _ Yi ty2 + 3 : 
(27)? 2 


It follows that Y1, Y2, Y3 are also iid RVs with common PDF /. 
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In Example 6 the transformation used is orthogonal and is known as Helmert’s 
transformation. In fact, we will show in Section 7.6 that under orthogonal transfor- 
mations iid RVs with PDF f defined above are transformed into iid RVs with the 


same PDF. 
In Example 6 it is easily verified that 


3 2 
xX) +x24+x3 


We have therefore proved that (X; + X2 + X3) is independent of ee {X; —((X14+ 
X2+X3)/ 3}}*. This is a very important result in mathematical statistics, and we will 
return to it in Section 7.5. 


Example 7. Let (X, Y) be a bivariate normal RV with joint PDF 
1 


f@,y= andi pie 
1 (x—p1)? 2p — wi)(y — Ha) | CY — M2)? 
SOND a oe gr en eh ge ae 
2(1 — p*) 0; 0102 os 
—00<x< 00, -—OO<y<00; M1, ER, WER; 


and o, >0, o2>0, lp} <1. 
Let 


X 
U; =VX2+Y? and Up > 


For u; > 0, 


Vx7+y? =u; and * =u. 
y 


have two solutions: 


uy 4] 
Blin pgs A pre and x2=-%1, ya=—y1 
yituy Vviltu5 
for any uz € R. The Jacobians are given by 
“2 uj 
2)3/2 
[1 +u3 (1 + u5)3/ uy 
J=h= 1 uyu2 Tange 


V1 +u3 (1 + 43)3/? 
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It follows from the result in Remark 2 that the joint PDF of (U;, U2) is given by 


uae uy | 
1403 Jitu ee 
wu, 42) = iis 


ifu; > 0,u. € R, 


ee fi+uz Tau 


In the special case where 21 = 42 = 0, p = 0, and o, = 02 = a, we have 


otherwise. 


ela? +y)/207] 


f@,y= 
so that X and Y are independent. Moreover, 


fy) = f(-x, -y), 


and it follows that when X and Y are independent, 


1 2u 
5 te tie? uy >0, —0O < uz < 0, 
w(u,, 42) = { 2n0 1+u; 
0, otherwise. 
Since 
w (wy, ep) = A gai? 
n(1+ u3) o2 


it follows that U; and U2 are independent with marginal PDFs given by 


uy —u2/202 
ze ! ’ uy > 0, 
wi (ui) = {07 
0, u; <9, 
and 
w2(u2) : fore) fore) 
242) = —>> am <4u7< ’ 
xa + u3) 
respectively. 


An important application of the result in Remark 2 will appear in Theorem 4.7.2. 
Theorem 3. Let (X, Y) be an RV of the continuous type with PDF f. Let 


Z=X+Y, U=xX-Y, and V=XY; 
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and let W = X/Y. Then the PDFs of Z, V, U, and W are, respectively, given by 


(2) fae) = f f(x,z—x)dx, 
3) fo = i futy,y)dy, 
—oo 
io v\ 1 
(a froy= ft (x2) Rae, 
and 
(5) futw) = f f (xw, x) |x| dx. 


The proof is left as an exercise. 


Corollary. If X and Y are independent with PDFs f; and f2, respectively, then 


fo 6) 
(6) fate) = | Si@) fale — x) dx, 
(7) fu(@) = [. fitu+ y) fay) dy, 
8) fo= [neon (2) Sax, 
and 
(9) Sw(w) = / Fixw) fo(x) |x| dx. 


Remark 3. Let F and G be two absolutely continuous DFs; then 


le. ¢) lo. @) 
Hx) = | Fex—yG'@ay = f G(x — y) F’(y) dy 


—O0O 


is also an absolutely continuous DF with PDF 


H'(x) = / F'(x — y)G'(y) dy = J G'(x — y)F'(y) dy. 
—00 
If 


F(x) = )0 pee(x— xx) and G(x) = ) aye - y,) 
k j 
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are two DFs, then 


A(x) = a D> peqjetx — Xk — yy) 
kj 


is also a DF of an RV of the discrete type. The DF H is called the convolution of 
F and G, and we write H = F * G. Clearly, the operation is commutative and 
associative; that is, if F;, PF), F3 are DFs, F, * Fp = Fo * Fy and (F * Fo) * Fy = 
F, * (Fo * F3). In this terminology, if X and Y are independent RVs with DFs F and 
G, respectively, X + Y has the convolution DF H = F *G. Extension to an arbitrary 
number of independent RVs is obvious. 


Finally, we consider a technique based on MGF or CF which can be used in 


certain situations to determine the distribution of a function g(X1, X2,..., Xn) of 
X1, X2,...,Xn.- 
Let (X, X2,... , X,) be an n-variate RV, and g be a Borel-measurable function 


from Ry, to Ry. 
Definition 1. If (X1, X2,... , Xn) is discrete type and 


D> leer, x2, ..- Xn) PAX) = x1, Xo = 22, ...,Xn = In} < 00, 


K15-- Xn 
then the series 


Eg(X1, X2, ove Xn) 


3 > 8(%1, X25... An) P{X1 = x1, X2 = X2,.-., Xn = Xn} 
Xp)... Xn 
is called the expected value of g(X 1, X2,..., Xn). If (X1, X2,... , Xn) iS a contin- 


uous RV with joint PDF f, and if 


oO co [o.@) n 
/ i af |aQx1, x2, --. xn) F G1, ¥2,-.. 40) |] dx; < 00, 
i Catt “700. i=1 


then 


Eg(X1, X2,...,Xn) 
n 

-[{ f-- of els. cam fori aa sae) TY a 
i=] 


is called the expected value of g(X1, X2,..., Xn). 
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Let Y = 9(X1, X2,..., Xn), and let h(y) be its PDF. If E|¥| < 00, then 


CO 
ey = [ yh(y) dy. 
—0o 


An analog of Theorem 3.2.1-holds. That is, 


Loe) CO CO foe) n 
/ yhty) dy = | | af B(X1,%2,--- Xn) f (11, x2, .-. xn) [| dxi, 
~00 —o0 J—00 —0o ii 


in the sense that if either integral exists, so does the other, and the two are equal. The 
result also holds in the discrete case. 

Some special functions of interest are }” j=i Xj My = a ) where ki, k2,... , kn 
are nonnegative integers, evi= 14%), where t1,!2,...,t, are real numbers, and 


ef Lj=17*), where i = /—1. 


Definition 2. Let X;, X2,..., X, be jointly distributed. If E (eLi=! 4X) exists 
for |t;| < Aj, j =1,2,...,n, forsomeh; > 0, j =1,2,...,, we write 


(10) M(t, fo, -.. 5 tp) = EeliXitaXat tinXn 


and call it the MGF of the joint distribution of (X1, X2,... , Xn) or, simply, the MGF 
of (X1, X2,..., Xn). 


Definition 3. Let t), %,... ,t, be real numbers and i = /—1. Then the CF of 
(X1, X2,... , X,) is defined by 


(4) b(t... tm) = E E ( sux) 
j=l 
=E om (yux,)| +iE E (S4x,)| 
j=l j=l 


As in the univariate case #(t, t2,... , tn) always exists. 


We will deal mostly with MGF even though the condition that it exist for |t;| < 
hj, j = 1,2,...,m restricts its application considerably. The multivariate MGF 
(CF) has properties similar to the univariate MGF discussed earlier. We state some of 
these without proof. For notational convenience we restrict ourselves to the bivariate 
case. 


Theorem 4. The MGF M(t, f2) uniquely determines the joint distribution of 
(X, Y), and conversely, if the MGF exists, it is unique. 


Corollary. The MGF M (f;, t2) completely determines the marginal distributions 
of X and Y. Indeed, 
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(12) M(t, 0) = Ee"* = Mx(n), 
and 
(13) M(0, to) = Ee?” = My(t2). 


Theorem 5. If M(t), t2) exists, the moments of all orders of (X, Y) exist and may 
be obtained from 


a+" M(t, t2) 


(14) = E(X™Y"). 
ay" ary ty=t2=0 
Thus 
dM(0, 0 
aM(0, 0) = EX, aM, 0) = FY, 
at dtz 
2 2 
M(O,0 a°M(0,0 
lh) = EX?, AS = EY?, 
at ats 
a2M (0, 0) 
= E(XY), 
at dt OY) 
and so on. 


A formal definition of moments in the multivariate case will be given in Sec- 
tion 4.5. 


Theorem 6. X and Y are independent RVs if and only if 
(15) M(t, tz) = M(t;, 0) M(0, t2) for all ty, f2 € R. 
Proof. Let X and Y be independent. Then 
M(th, t2) = Be **2Y — (Ee''*)(Ee*) = M(t, 0)M(O, 12). 
Conversely, if 
M(t), t2) = M(t, 0)M(O, to), 


then in the continuous case, 


i et thy F(x, y)dxdy = [/ ef flsds| If e2Y fo(y) ay), 


/ / eX 429 F(x, y) dx dy = I i ell*+2Y F(x) foly) dx dy. 


that is, 
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By the uniqueness of the MGF (Theorem 4) we must have 
fa, y= fi)faly) — forall x,y) € Re. 


It follows that X and Y are independent. A similar proof is given in the case where 
(X, Y) is of the discrete type. 


The MGF technique uses the uniqueness property of Theorem 4. To find the dis- 
tribution (DF, PDF, or PMF) of Y = g(X1, X2,... , Xn) we compute the MGF of Y 
using the definition. If this MGF is one of the known kind, Y must have this kind of 
distribution. Although the technique applies to the case when Y is an m-dimensional 
RV, 1 < k <n, we will use it mostly for the m = | case. 


Example 8. Let us first consider a simple case when X is normal PDF 


1 2 
(x) = er, oc <x <—oo. 
f V2n 
Let Y = X?. Then 
Mig aEee a" 7 °° U/2V1=25)32 
= Scien ‘ 
V2n J—oo 
= : forx <} 
== 28 2° 
It follows (see Section 5.3 and Example 2.5.7) that Y has a chi-square PDF 
(62) elle 0 
wly) = : y>O. 
Syn 


Example 9. Suppose that X; and Xz are independent with common PDF f/f of 
Example 8. Let Y; = X,—X2. There are three equivalent ways to use MGF technique 
here. Let Y2 = X2. Then rather than compute 


M(sy, 52) = Betti t82¥2, 
it is simpler to recognize that Y; is univariate, so 
My, (s) = Ees(X1-X2) 
= (Ee*!)(Ee5*2) 


2 2 2 
= e& /298°/2 — os ; 


It follows that Y; has PDF 


fice 
ext 


1 
f@= Vig : 


—OO < xX < OO. 


Note that My, (s) = M(s, 0). 
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Let Y3 = X, + X2. Let us find the joint distribution of Y; and Y3. Indeed, 


Fe ti t92¥3 — E(e'sits2)X1 . el ~82)X2) 


= (Ee ts% )(Ee ~#2)X2) 


— glsits,)?/2 | o(s1-52)7/2 — St. 95 


and it follows that Y; and Y3 are independent RVs with common PDF f defined 
above. 


The following result has many applications, as we will see. Example 9 is a special 
case. 


Theorem 7. Let X,, X2,..., Xn be independent RVs with respective MGFs 


Mi(s), i = 1,2,...,n. Then the MGF of Y = Yai a, X; for real numbers 
a), 42,... , Gn iS given by 


My(s) = IT M; (ais). 
i=l 


Proof. If M; exists for |s| < hj, hj > 0, then My exists for |s| < min(Ay,... , An) 
and 


ta n 
My(s) = EeS 2% —T] Be'* =] Mi(ais). 


i=l i=} 
Corollary. If X;’s are iid, the MGF of Y = >-j X; is given by My(s) = [M(s)]". 


Remark 4. The converse of Theorem 7 does not hold. We leave the reader to 
construct an example illustrating this fact. 


Example 10. Let X1, X2,..- , Xm be iid RVs with common PMF 


n 


P{X =k}= . 


ora —p)y"*, =k =0,1,2,...,0; O<p<l. 


Then the MGF of X; is given by 
M(t) =(1—p+pe')". 


It follows that the MGF of S,, = X; + X2+---+ Xm is 


m 
Msg, (t) = | [d — p + pe’y” = (1 - p+ pety™, 
1 


146 MULTIPLE RANDOM VARIABLES 


and we see that S,;, has the PMF 
P(Sm = 5} = ("")er — py™-5, 5 =0,1,2,....mn. 
s 


From these examples it is clear that to use this technique effectively one must be 
able to recognize the MGF of the function under consideration. In Chapter 5 we study 
a number of commonly occurring probability distributions and derive their MGFs 
(whenever they exist). We will have occasion to use Theorem 7 quite frequently. 

For integer-valued RVs one can sometimes use PGFs to compute the distribution 
of certain functions of a multiple RV. 

We emphasize the fact that a CF always exists and analogs of Theorems 4 to 7 
can be stated in terms of CF’s. 


PROBLEMS 4.4 


1. Let F be a DF and ¢ be a positive real number. Show that 


1 x+é 
Wi@w)= -[ F(x) dx 
é Jx 
and 
1 x+E 
W(x) = xf F(x) dx 
2€ Jx—e 


are also distribution functions. 


2. Let X, Y be iid RVs with common PDF 
e* ifx > 0, 
FRI < ifx <0. 
(a) Find the PDF of RVs X + Y, X — Y, XY, X/Y, min{X, Y}, max{X, ¥}, 
min{X, Y}/max{X, Y}, and X/(X + Y). 
(b) Let U = X + Y and V = X — Y. Find the conditional PDF of V, given 
U = u, for some fixed u > 0. 


(c) Show that U and Z = X/(X + Y) are independent. 


3. Let X and Y be independent RVs defined on the space (2, S, P). Let X be 
uniformly distributed on (—a, a), a > 0, and Y be an RV of the continuous type 
with density f, where f is continuous and positive on R. Let F be the DF of Y. 
If ug € (—a, a) is a fixed number, show that 
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fy) 


= ifug -a<y<ugta, 
frix+y(y | uo) = 4 F(uo + 4) — F(uo — a) 
0 


otherwise. 


where fy|x+y(y | uo) is the conditional density function of Y, given X+Y = ug. 
4. Let X and Y be iid RVs with common PDF 


1 fO<x <1, 
0 otherwise. 


ro=| 


Find the PDFs of RVs XY, X/Y, min{X, Y}, max{X, Y}, min{X, Y}/ max{X, Y}. 


5. Let X1, X2, X3 be iid RVs with common density function 


if0<x <1, 
otherwise. 


1 
rey={h 


Show that the PDF of U = X; + X2 +_X3 is given by 


uz 


= O<u<l, 
3 
ne tu 40? — =, 1<u <2, 
(u — 3) 
Scope en 2<u <3, 
0, elsewhere. 


An extension to the n-variate case holds. 


6. Let X and Y be independent RVs with common geometric PMF 
P{X=kj=n(l1—2), -k=0,1,2,...5 O<a <1. 


Also, let M = max{X, Y}. Find the joint distribution of M and X, the marginal 
distribution of M, and the conditional distribution of X, given M. 


7. Let X be a nonnegative RV of the continuous type. The integral part, Y, of X is 
distributed with PMF P{¥Y = k} = Ake~*/k!, k =0,1,2,... , 4 > 0; and the 
fractional part, Z, of X has PDF f,(z) = 1 if 0 < z < 1, and = O otherwise. 
Find the PDF of X, assuming that Y and Z are independent. 


8. Let X and Y be independent RVs. If at least one of X and Y is of the continuous 
type, show that X + Y is also continuous. What if X and Y are not independent? 
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9. Let X and Y be independent integral RVs. Show that 
P(t) = Px(t)Py(), 


where P, Px, and Py, respectively, are the PGFs of X + Y, X, and Y. 


10. Let X and Y be independent nonnegative RVs of the continuous type with PDFs 
f and g, respectively. Let f(x) = e~* if x > 0, and = Oif x < 0, and let g 
be arbitrary. Show that the MGF M(t) of Y, which is assumed to exist, has the 


property that the DF of X/Y is 1 — M(-2). 
11. Let X, Y, Z have the joint PDF 


61 +x+y+z2)74 fO<x,0<y,0<2z, 
0 otherwise. 


fey =| 


Find the PDF of U = X + Y + Z. 
12. Let X and Y be iid RVs with common PDF 


(xVJ 2a) 1e- 1/2) log x)? x >0, 
fx) = 
0, x <0. 


Find the PDF of Z = XY. 


13. Let X and Y be iid RVs with common PDF f defined in Example 8. Find the 
joint PDF of U and V in the following cases: 
(a) U=JVX2+Y?2, V =tan!(X/Y), —x/2< V < 2/2. 
(b) U =(X4+ Y)/2, V =(X — ¥)*/2. 

14. Construct an example to show that even when the MGF of X + Y can be writ- 


ten as a product of the MGF of X and the MGF of Y, X and Y need not be 
independent. 


15. Let X1, X2,... , Xp, be iid with common PDF 
1 : 
f@)= Ron? a<x<b, =O otherwise. 
—a 


Using the distribution function technique, show that: 


(a) The joint PDF of Xq,) = max(X1, X2,..., Xn), and X() = min(Xy, Xo, 
..., Xn) is given by 


n(n — 1)(x — yy"? 


bar ; a<y<x<hb, 


u(x, y) = 


and = 0 otherwise. 
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(b) The PDF of X(n) is given by 


_ ayn 
g(z)= a a<z<b, =Ootherwise 
and that of X14) by 
n(b —z)""! ; 
A(z) = “@=aP a<z<b, =O otherwise. 


16. Let X,, X2 be iid with common Poisson PMF 


* 
P(X; =x) =e"—, x=0,1,2,..., i=1,2, 
x! 7 


where 4 > 0 is a constant. Let X(2) = max(X 1, X2) and Xqy) = min(X, X2). 
Find the PMF of X(2). 
17. Let X have the binomial PMF 


n 


P(X =h= ({)eta- py, k=0,1,...,m;5 O<p<1. 


Let Y be independent of X and Y £ X. Find the PMF of U = X + Y and 
W=X-Y. 


4.5 COVARIANCE, CORRELATION, AND MOMENTS 
Let X and Y be jointly distributed on (2, S, P). In Section 4.4 we defined Eg(X, Y) 
for Borel functions g on R2. Functions of the form g(x, y) = x/y*, where j and k 


are nonnegative integers, are of interest in probability and statistics. 


Definition 1. If E|X Jy*) < oo for nonnegative integers j and k, we call 
E(X/¥*) a moment of order (j +k) of (X, Y) and write 


(1) mje = E(X/Y*). 
Clearly, 


(2) mio = EX, mo; = EY, 
m2) = EX”, my, =E(XY), and mg = EY?. 

Definition 2. If E \(X —EX)(Y —E y)*| < oo for nonnegative integers j and 

k, we call E {(X — EX)/(¥ — EY)*} a central moment of order (j + k) and write 
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(3) mje = E{(X ~ EX ~ EY)*}. 
Clearly, 


(4) 410 = Ho = 9, [429 = var(X), oz = var(Y), and 
Hy, = E[{(X — mjo){(¥ — moy)]. 

We see easily that 

(5) yy = E(XY) — EXEY. 


Note that if X and Y increase (or decrease) together, then (X — EX)(Y — EY) should 
be positive, whereas if X decreases while Y increases (and conversely), the product 
should be negative. Hence the average value of (X — EX)(Y — EY), namely 214, 
provides a measure of association or joint variation between X and Y. 


Definition 3. If E[(X — EX)(Y — EY) exists, we call it the covariance between 
X and ¥ and write 


(6) cov(X, Y) = E[(X ~— EX)(¥ — EY)] = E(XY) — EXEY. 


Recall (Theorem 3.2.8) that E(Y — a)” is minimized when we choose a = EY 
so that EY may be interpreted as the best constant predictor of Y. If, instead, we 
choose to predict Y by a linear function of X, say aX + b, and measure.the error 
in this prediction by E(Y — aX — b)*, we should choose a and b to minimize this 
mean square error. Clearly, E(Y — aX — b)? is minimized, for any a, by choosing 
b = E(Y — aX) = EY — aEX. With this choice of b, we find a such that 


E(Y —aX — b)* = El(¥ — EY) — a(X — EX)? 


= of — ap +a°of 


is minimum. An easy computation shows that the minimum occurs if we choose 
(7) a=—>, 
provided that eee > 0. Moreover, 


min E(Y ~— ax — b)? = min or — apy +a°oz} 
a, a 


(8) = oF — 
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Let us write 


oxoy 


Then (8) shows that predicting Y by a linear function of X reduces the prediction 
error from a? to of (1 — p’). We may therefore think of p as a measure of the linear 
dependence between RVs X and Y. 


Definition 4. If EX, EY* exist, we define the correlation coefficient between 
X and Y as 
cov(X, Y) E(XY) — EXEY 
p SS ee —————————SS ES, 
SD(X)SD(Y) EX? — (EX)? EY? — (EY) 


where SD(X) denotes the standard deviation of RV X. 


(10) 


We note that for any two real numbers a and b, 


2 4 p2 

a+b 
bl < : 
lab} = 2 


so that E|XY| < 00 if EX? < oo and EY? < oo. 


Definition 5. We say that RVs X and Y are uncorrelated if p = 0, or equivalently, 
cov(xX, Y) = 0. 


If X and Y are independent, then from (5) cov(X, Y) = 0 and, X and ¥Y are 
uncorrelated. If, however, p = 0, then X and Y may not necessarily be independent. 


Example 1. Let U and V be two RVs with common mean and common variance. 
Let X¥ =U+V and Y = U — V. Then 


cov(X, Y) = E(U* — V*)— E(U+ V)E(U — V) =0 


so that X and Y are uncorrelated but not necessarily independent (see Example 
4.4.9), 


Let us now study some properties of the correlation coefficient. From the defini- 
tion we see that p [and also cov(X, Y)] is symmetric in X and Y. 


Theorem 1 


(a) The correlation coefficient o between two RVs X and Y satisfies 


(11) lp| <1. 


(b) The equality |o| = 1 holds if and only if there exist constants a 4 0 and b 
such that P{aX +b=1}=1. 
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Proof, From (8) since E(Y — aX — b)? > 0, we must have 1 — p? > 0, or 
equivalently, (11) holds. 

Equality in (11) holds if and only if p? = 1, or equivalently, E(Y — aX —b)? =0 
holds. This implies and is implied by P(Y = aX + b) = 1. Herea £0. 


Remark 1, From (7) and (9) we note that the signs of a and p are the same, so if 
p =1,then P(Y = aX + b) where a > 0, and if op = —1, thena < 0. 


Theorem 2. Let EX? < 00, EY? < oo, andlet U = aX +b, V =cY¥ +d. Then 


ex,y = tpu,v. 


where px,y and py,y, respectively, are the correlation coefficients between X and Y 
and U and V. 


The proof is simple and is left as an exercise. 


Example 2. Let X, Y be identically distributed with common PMF 
1 


P{X=kh=—, k=1,2,...,N(N > 1). 
Then 
N+1)QN 
ex-cy-Nt1 Ex? = gy? = NtVON + DY 
2 6 
so that 
eau 
var = Var = 12 
Also, 
E(XY) = 3[EX? + EY? — E(X —Y)’] 
_(N+DQN+1)  EX-Y¥)? 
~ 6 2 ; 
Thus 
(N+ 1QN+41) E(X-Y) (N+1)? 
cov(x, Y) = 6 5 ri 
(N+1)(N—1) 1 ; 
= ————_—__- — -E(X -- Y)*, 
12 2 ( ) 
and 
(N2 — 1)/12 — E(X — Y)?/2 
oo teary, ES Ta TS 


(N2 — 1)/12 
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6E(X — Y)? 
= | - —~———__.. 
N?2-1 
If P{X = Y} = 1, then op = 1, and conversely. If P{Y = N+ 1— X}=1, then 
E(X —Y)* = E2X —N—1)? 


2 
ag ON) ya 


N +1), 
r3 5 +(N +1) 


and it follows that pxyy = —1. Conversely, if px,y = —1, from Remark | it follows 
that Y = —aX + b with probability 1 for some a > 0 and some real number b. To 
find a and b, we note that EY = —aEX +5, so that b = [(N + 1)/2](1 +). Also, 
EY? = E(b —aX)*, which yields 

(1 — a?) EX? + 2abEX —b? =0. 


Substituting for b in terms of a and the values of EX? and EX, we see that a? = 1, 
sothata = 1. Hence b = N-+1, and it follows that Y = N+1—X with probability {. 


Example 3. Let (X, Y) be jointly distributed with density function 


x+y, O<x<1, O<y<l, 
0, otherwise. 


I, y) = | 
Then 


1 | 
E(x'y”) -| | x! y™(x + y) dx dy 
0 40 


1 
-[f igacee xl y™+l dy dy 
0 JO 


~U4Dm+D +oTpe 


where / and m are positive integers. Thus 


EX=EY= 4, 

3 

EX*=EY’=%, 
var(X) = var(Y)= 5 -ia= tH 


and 


cov(X,Y)=$-7=-r ped 
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Theorem 3. Let X,, X2,... , X, be RVs such that E|X;| < 00, i= 1,2,...,n. 
Let a1, a2, ... , a, be real numbers, and write 


S =a,X; +a2X2+°:--+anXn. 


Then E'S exists, and we have 


n 
(12) ES =) ajEX;. 
jal 


Proof. If (X1, X2,... , Xn) is of the discrete type, then 


ES= ye (a1Xj, + @2Xj, +... + AnXi,)P{X1 = xi,,X2 = Xin, ... , Xn = Xi, } 


i} ,i2,... sin 


= ay xi ay P(X, =xj,,..., Xn = Xing) 


i2,... 


ty an 
+e+n) Xin D> P(X =2y,--., Xn = XH) 
tn 


fy, sén-1 
=a ) xi P{X1 = xi) +--+ +an >> P{Xn =%i,} 
it in 
= a, EX, +---+a4,EXn. 
The existence of E'S follows easily by replacing each a; by |a;| and each x;; by 


|x;;| and remembering that E|X;| < 00, j7 = 1,2,...,n. The case of continuous 
type (X1, X2,... , Xn) is treated similarly. 


Corollary. Take a, = az = --- =a, = 1/n. Then 
xX nae ep '¢ Lx 
p (MA eeb tts) Wy eX, 
a ere) 
and if EX; = EX. =---= EX, = p, then 
(S24) 
E eae = pl. 


Theorem 4. Let X;, X2,... , X, be independent RVs such that E|Xj;| < 00,i = 
1,2,...,m. Then E([]}_ Xj) exists and 


(13) E (I xi) = ll EX;. 
i=] i=tk 


Let X and Y be independent and g;(-) and g2(-) be Borel-measurable functions. 
Then we know (Theorem 4.3.2) that g1(X) and g2(Y) are independent. If E[g1(X)], 
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E[g2(Y)], and E[g1(X) g2(Y)] exist, it follows from Theorem 4 that 


(14) E[gi(X) g2(¥)] = Elgi(X)] Elga(¥)). 


Conversely, if for any Borel sets Aj and Az we take g1(X) = | if X € Aj, and = 0 
otherwise, and g2(Y) = 1 if Y € Az, and = 0 otherwise, then 


E[gi(X)g2(¥)] = P{X € Ai, ¥ € Az} 


and E[g1(X)] = P{X € Aj}, Elgo(Y)}] = P{Y € Az}. Relation (14) implies that 
for any Borel sets A, and A2 of real numbers 


P{X € Aj, Y € Az} = P{X € Ay} P{Y € Ad}. 


It follows that X and Y are independent if (14) holds. We have thus proved the 
following theorem. 


Theorem 5. Two RVs X and Y are independent if and only if for every pair of 
Borel-measurable functions g; and g2 the relation 


(15) E[gi(X)g2(¥)] = Elgi(X)] Ele2(V)] 
holds, provided that the expectations on both sides of (15) exist. 


Theorem 6. Let X;, X2,... , Xn be RVs with E|X;|* < 00 fori = 1,2,... ,n. 
Let aj, a2, ... , @, be real numbers and write S§ = }~7_, a; X;. Then the variance of 
S exists and is given by 


n n n 
(16) var(S) = } a? var(X;) + D>) aia; cov(Xi, Xj). 
i=] i=l j=1 
ifj 
If, in particular, X;, X2,... , X, are such that cov(X;, X;) = Ofori, j = 1,2,...,n, 
i # j, then 
n 
(17) var(S) = ) > a? var(X;). 
= 


Proof. We have 
n n 2 
var(S) = E (x: aX; — Sex 
i=l i=] 


=E pene — EX;)’ +) ajaj(Xj — EXi)(Xj - exp| 


i=l i<j 
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n 
= )\ a? E(X; — EXi)’ +) ajajE[(X; — EXi)(Xj — EX;)). 
i=! ifj 
If the X;’s satisfy 
cov(X;, X;) =0 fori, j =1,2,...,m; if Jj, 
the second term on the right side of (16) vanishes, and we have (17). 


Corollary 1. Let X;, X2,... , X, be exchangeable RVs with var(X;) = o7,i= 
1,2,...,n. Then 


n n n 
var (Sax) =o? ><a? + pa? Y aiaj, 
i=l i=] ifj 


where p is the correlation coefficient between X; and X;, i # j. In particular, 


Corollary 2. If X1, X2,... , X, are exchangeable and uncorrelated, then 


n nr 
var So aiX; =o" 4, 
i=l i=l 


and 
n 2 
Xj o 
a y —)j=—. 
we “) A 


Theorem 7. Let X;, X2,... , Xp, be iid RVs with common variance o2. Also, let 
aj, 42,... ,@y be real numbers such that }“ a; = 1, and let S = )-7_, a;X;. Then 
the variance of S is least if we choose a; = 1/n,i = 1,2,...,n. 


Proof. We have 


nm 


var(S) = oa? > a?, 


t= 


which is least if and only if we choose the a;’s so that )7/_, a? is smallest, subject 
to the condition )°)_; a; = 1. We have 
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i=I 


which is minimized for the choice aj = 1/n, i = 1,2,...,n. 
Note that the result holds if we replace independence by the condition that X;’s 
are exchangeable and uncorrelated. 


Example 4. Suppose that r balls are drawn one at a time without replacement 
from a bag containing n white and m black balis. Let S, be the number of black balls 
drawn. 

Let us define RVs X; as follows: 


1 if the kth ball drawn is black 
Xk = RSL 2 25K 
0 if the kth ball drawn is white 


Then 
S, = Xi + Xo+---4+X,. 
Also, 
m n 
18 P{X,; = 1} = ——, and P{X,=0}= . 
(18) {X, = |} aan {X_ = 0} aaa 


Thus EX; = m/(m +n), and 


wT, m m mn 
m+n (m+n)? (m+n)2’ 


To compute cov(X ;, Xx), j # k, note that the RV X ; X; = 1 if the jth and kth balls 
drawn are black, and = 0 otherwise. Thus 


(19) E(X)X4) = P(X) = 1, Xe = N= 
and 
mn 
cov(X ;, Xx) = Tate 1). 
Thus 
rs mr 
ES;= > EXy= ae 


k=1 
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and 
mn 


nS re +n)2(m+n—1) 


mn 

V; S, = 7 —_——_—_- — 
ar(S,) Larne 

mnr 


= es aa ree 


Readers are asked to satisfy themselves that (18) and (19) hold. 


Example 5. Let X, X2,...,Xn be independent, and aj,a2,...,a, be real 
numbers such that }°a; = 1. Assume that E|X?| < o,i = 1,2,...,n, and 
let var(X;) = o7,i = 1,2,...,n. Write S = Yjn1 2 Xi- Then var($) = 


i? 


fat azo? = 0, say. To find weights a; such that o is minimum, we write 
o = ato? + ado3 +--+ + (1 —ay ~ a) —-++— dy-1)"o2, 
and differentiate partially with respect to a1, a2,... , @n—1, respectively. We get 
do 
—= 2a,o7 —2(-—a,;-—a.---:- — dn-1)o = 0, 
day 
a0 
— 2ay fae | —20 -—a,;—-—a.—----—- an—1)07 = 0. 
Oan—} 
It follows that 
ajo; =ano,, j=1.2,...,7-1, 


that is, the weights aj, j = 1,2,... ,n, should be chosen proportional to 1 /o3. The 
minimum value of o is then 


where k is given by )°%_y (k/o7) = 1. Thus 


1 H 


~ y/o?) on 


where H is the harmonic mean of the oF. 


We conclude this section with some important moment inequalities. We begin 
with the simple inequality 


(20) la + bl” <c,(lal’ + |b’), 
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where c, = 1 forO < r < 1, and = 2’~! forr > 1. Forr = Oandr = 1, (20) is 
trivially true. 

First note that it is sufficient to prove (20) when 0 < a < b. LetO0 < a < b, and 
write x = a/b. Then 


(a+b) +x)" 
a+b l4+xr 


Writing f(x) = (1+ x)’/(1 +x"), we see that 


r(l+xy7! 


Cpa 2 


fa) = 


where 0 < x < |. It follows that f’(x) > Oifr > 1, =Oifr = 1,and < Oifr <1. 
Thus 


re PON ifr <1, 


while 


max f(x)=f()=2! ifr >1. 


0<x<1 
Note that Ja + b|” < 2'(|a|’ + |b]”) is trivially true since 
la + b| < max(2Ia], 2|d}). 
An immediate application of (20) is the following result. 


Theorem 8. Let X and Y be RVs andr > 0 be a fixed number. If E|X|", E|Y |" 
are both finite, so also is E|X + Y|". 


Proof. Leta = X and b = ¥ in (20). Taking the expectation on both sides, we 
see that 


E|X+Y|" <c-(E[X|" + El¥]), 
where c, = 1ifO <r <land=2’~' ifr > 1. 
Next we establish Hélder’s inequality, 


Pp q 
(21) ie aE 
P q 


where p and q are positive real numbers such that p > 1 and 1/p + 1/q = 1. Note 
that for x > O the function w = log x is concave. It follows that for x;, x2 > 0, 


log[tx; + (1 — t)x2] = tlogx; + (1 — 2) log x2. 
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Taking antilogarithms, we get 
1-t 


x4X) > txy + (1 —2)x2. 


Now we choose x; = |x|?,x2 = lyl?,f = 1/p,1—t = 1/q, where p > 1 and 
1/p + 1/q = 1, to get (21). 


Theorem 9. Let p > 1,q > 1, so that 1/p + 1/q = 1. Then 
(22) E|XY| < (E|X|?)'/?(E|¥|9)'/4. 


Proof, By Holder’s inequality, letting x = X[E|X|?}~'/?, y = Y[E|Y|9]-'/4, 
we get 


IXY] < po'|X|PCEIX|P]/P Ey (9 + g NY PLE LY |e Lexy? 
Taking the expectation on both sides leads to (22). 

Corollary. Taking p = q = 2, we obtain the Cauchy—Schwarz inequality, 

E|XY| < E'?\xPe!\y/?. 

The final result of this section is an inequality due to Minkowski. 

Theorem 10. For p > 1 
(23) [EX + YIP]? < [EIX|P]'/? + (EYP. 

Proof. We have, for p > 1, 

IX4+Y|P <|X[[X4+¥lP-' + [YyLX + ype) 


Taking expectations and using Hélder’s inequality with Y replaced by |X+Y|? “lip > 
1), we have 


E|X + Y\P <[E|X|?]/P[E\X + YP D944 4 LEY |P YP LE|X + Y{P—P4]1/4 
= {{E|X|P]'/? + [E|Y|Py/?} LEX + ¥jP~D9y/9, 


Excluding the trivial case in which E|X + Y|? = 0, and noting that (p — lq = p, 
we have, after dividing both sides of the last inequality by [E|X + ¥|?]'/4, 


[EIX + YP y/? <(elxyPy/? + [eiylPy/?, p> 1. 


The case p = | being trivial, this establishes (23). 
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PROBLEMS 4.5 
1. Suppose that the RV (X,Y) is uniformly distributed over the region R = 
{(x, y): 0 < x < y < I}. Find the covariance between X and Y. 
2. Let (X, Y) have the joint PDF given by 


ote : 
xe 4+ if0<x<1,0<y <2, 
f(x,y) | 3 


otherwise. 


Find all moments of order 2. 


3. Let (X, Y) be distributed with joint density 


foe.y) = [FU +90? —-Y ifkl=e Lb st, 
, 0 otherwise. 


Find the MGF of (X, Y). Are X, Y independent? If not, find the covariance 
between X and Y. 


4. For a positive RV X with finite first moment, show that (a) EX < /EX and 
(b) E(1/X) > 1/EX. 


5. If X is a nondegenerate RV with finite expectation and such that X > a > 0, 
then 


E{V X2 ~ a2} < J(EX)? — a?. 


(Kruskal [54]) 
6. Show that for x > 0, 


0° 2 2 foe) H oo 3 
(/ te /2 at) <| et Par | re" /? dt, 
x x x 


and hence that 


ager 1 2, 1/2 —x?/2 
| et? dt > S[(4+x7)? —xJer, 
x 


7. Given a PDF f that is nondecreasing in the interval a < x < b, show that for 
anys > 0 


b 2 +1 _ ql 
[= f(x)dx > i hE sf fQ)dx, 


with the inequality reversed if f is nonincreasing. 
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8. Derive the Lyapunov inequality (Theorem 3.4.3) 
[EIXI}" <[e|xpy'*, = l<r<s<oo, 


from Hdlder’s inequality (22). 


9. Let X be an RV with E|X|" < oo forr > 0. Show that the function log E|X|’ 
is a convex function of r. 


10. Show with the help of an example that Theorem 9 is not true for p < 1. 


11. Show that the converse of Theorem 8 also holds for independent RVs; that is, if 
E|X + Y|" < co for some r > 0 and X and Y are independent, then E|X|" < 
oo, E|Y|" < oo. (Hint: Without loss of generality, assume that the median of 
both X and Y is 0. Show that for any t > 0, P{|X+Y| >t} > 5 P{|X| > th}. 
Now use the remarks preceding Lemma 3.2.2 to conclude that E|X|" < oo.) 


12. Let (82, S, P) be a probability space and Aj, A2,... , An be events in S such 
that P(Up_, Ax) > 0. Show that 


(Shar PAn)? — Dee PAk 


2 P(Aj;Ax) = 
> ( Jj = P(UR_, Au) 


1<j<k<n 


(Hint: et X;, be the indicator function of Ay, k = 1,2,... ,n. Use the Cauchy- 
Schwarz inequality.) (Chung and Erdés [13]) 


Let (2, S, P) be a probability space and A, B € S withO < PA < 1,0 < 
PB < 1. Define p(A, B) by p(A, B) = correlation coefficient between RVs I, 
and Ip, where [,, Ip, are the indicator functions of A and B, respectively. Ex- 
press (A, B) in terms of PA, PB, and P(AB), and conclude that p(A, B) = 0 
if and only if A and B are independent. What happens if A = B orif A = B°? 


(a) Show that 


13 


p(A, B) > 0 & P{A] B} > P(A) & P{B| A} > P(B) 
and 
p(A, B) <0@ P{A| B} < PA © P{B| A} < PB. 


(b) Show that 


P(AB) P(A‘ B°) — P(AB‘) P(A‘B) 


A, B) = 
Pla?) (PA PAC. PB PB*)1/2 


14, Let X1, X2,... , X, be iid RVs, and define 


Dial Xj and s2 _ Dea Xi i xy? . 


X= 
n n—-1 
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15. 


16. 


17. 


18. 


19. 


Suppose that the common distribution is symmetric. Assuming the existence of 
moments of appropriate order, show that cov(X, S$ 2)=0. 


Let X, Y be iid RVs with common standard normal density 
1 ee 2 

V2 

Let U = X + Y and V = X? + Y*. Find the MGF of the random variable 


(U, V). Also, find the correlation coefficient between U and V. Are U and V 
independent? 


—0OO <X < Ww. 


fa)= 


Let X and Y be two discrete RVs: 

P{X = xi} = pi, P{X = x2} = 1— pr, 
and 

PLY = yi} = pr, P{Y = yo} =1—- pr. 


Show that X and Y are independent if and only if the correlation coefficient 
between X and Y is zero. 


Let X and Y be dependent RVs with common means 0, variances 1, and corre- 
lation coefficient o. Show that 


E({max(X?, Y*)] < 1+,/1—p?. 


Let X;, X2 be independent normal RVs with density functions 


1 1 Et) ; 
exp | —- ; —co<x<0o; i=1,2. 
oi JV 20 "| 3( Oj 


fi@= 


Also let 
Z = X;cos6+ X2sin0 and W = X2cos@ — X; sind. 


Find the correlation coefficient, 0, between Z and W, and show that 


2 
2 
O<p’< or o2) 
of +o3 
Let (X, X2,... , Xn) be an RV such that the correlation coefficient between 
each pair X;, X;,i # j, is p. Show that ~—(n—1)"1<p <1. 
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20. Let Xj, X2,...,Xmin be iid RVs with finite second moment. Let S; = 
ee X;j,k = 1,2,...,m +n. Find the correlation coefficient between S, 
and Smin ~ Sm, where n > m. 


21. Let f be the PDF of a positive RV, and write 


{G+ ifx > 0, y>0, 
g(x,y) = x+y 
0 


otherwise. 


Show that g is a density function in the plane. If the mth moment of f exists for 
some positive integer m, find EX”. Compute the means and variances of X and 
Y and the correlation coefficient between X and Y in terms of moments of f. 
(Adapted from Feller [23, p. 100).) 


22. A die is thrown n + 2 times. After each throw a + sign is recorded for 4, 5, or 6, 
and a — sign for 1, 2, or 3, the signs forming an ordered sequence. Each sign, ex- 
cept the first and the last, is attached a characteristic RV that assumes the value 1 
if both the neighboring signs differ from the one between them, and 0 otherwise. 
Let X1, X2,... , Xn be these characteristic RVs, where X; corresponds to the 
(i + 1)st sign @ = 1, 2,... ,m) in the sequence. Show that 


n n 5 a9. 
E (32x) = and var (32x) — = 
1 1 


23. Let (X, Y) be jointly distributed with PDF f defined by f(x, y) = 5 inside the 
square with corners at the points (0, 1), (1,0), (—1, 0), (0, ~1) in the (x, y)- 
plane, and f(x, y) = 0 otherwise. Are X, Y independent? Are they uncorre- 
lated? 


4.6 CONDITIONAL EXPECTATION 


In Section 4.2 we defined the conditional distribution of an RV X, given Y. We 
showed that if (X, Y) is of the discrete type, the conditional PMF of X, given Y = yj, 
where P{Y = y;} > 0, is a PMF when considered as a function of the x;’s (for 
fixed y;). Similarly, if (X, Y) is an RV of the continuous type with PDF f(x, y) and 
marginal densities f; and f2, respectively, then at every point (x, y) at which f is 
continuous and at which f2(y) > 0 and is continuous, a conditional density function 
of X, given Y, exists and may be defined by 


_ f@,y) 
fxy@aly= AG)” 
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We also showed that fx;y(x | y), for fixed y, when considered as a function of x 
is a PDF in its own right. Therefore, we can (and do) consider the moments of this 
conditional distribution. 


Definition 1. Let X and Y be RVs defined on a probability space (2, S, P), and 
let h be a Borel-measurable function. Then the conditional expectation of h(X), 
given Y, written as E{h(X) | Y}, is an RV that takes the value E{h(X) | y}, defined 
by 


So h@)P{X =x|Y=y} if (X, ¥) is of the discrete 
> type and P{Y = y} > 0, 


x 
/ h(x) fxyy & | y) dx if (X, Y) is of the continuous 
i. type and f2(y) > 0, 


(1) E{h(X) | y} = 


when the RV Y assumes the value y. 


Needless to say, a similar definition may be given for the conditional expectation 
E{h(Y) | X}. 

It is immediate that E{h(X) | Y} satisfies the usual properties of an expectation 
provided we remember that E{h(X) | Y} is not a constant but an RV. The following 
results are easy to prove. We assume the existence of indicated expectations. 


(2) E{c|Y}=c for any constant c 
and 
(3) — Ef{faigi(X) + a2g2(X)] | Y} = a1 E{gi(X) | Y} + an E{g2(X) | ¥}, 


for any Borel functions 91, g2. 


(4) P(X >0)=1=> E{X | Y}>0 
and 
(5) P(X, > X2)= 1 => E{X, | Y} > E{X2 | YI. 


The statements in (3), (4), and (5) should be understood to hold with probability 1. 
(6) E{X | Y} = E(X), E{Y | X}=E(Y%) 


for independent RVs X and Y. 
If @(X, Y) is a function of X and Y, then 


(7) E(o(X, Y) | y} = E{o(®, y) | y}, 


and 
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(8) E{y(X)b(X, Y) | X} = W(XDE(@(X, Y) | X)} 


for any Borel function y. 

Again, (8) should be understood as holding with probability 1. Relation (7) is 
useful as a computational device. See Example 3 below. 

The moments of a conditional distribution are defined in the usual manner. Thus, 
for r > 0, E{X" | Y} defines the rth moment of the conditional distribution. We 
can define the central moments of the conditional distribution and, in particular, the 
variance. There is no difficulty in generalizing these concepts for n-dimensional dis- 
tributions when n > 2. We leave the reader to furnish the details. 


Example I. An urn contains three red and two green balls. A random sample of 
two balls is drawn (a) with replacement, and (b) without replacement. Let X = 0 if 
the first ball drawn is green, = 1 if the first ball drawn is red, and let Y = 0 if the 
second ball drawn is green, = 1 if the second ball drawn is red. 

The joint PMF of (X, Y) is given in the following tables: 


(a) With replacement (b) Without replacement 


2 3 
5 65 


The conditional PMFs and the conditional expectations are as follows: 


2 2: 
=, x ; Bs 0, 
(a) P{X=x|0}=43 P(Y=y|o}={3 y 
5» x=1, 5? y 1, 
2 2 
: Zs 0, By 1, 
pax=sxin= fi ry =yin= i = 
5° 1, 5: y ’ 
3 3 0 
E{X|Yp=4s° E{Y | X}= 43" 
3 3 , 
3> y , io 1; 
1 a 
(b) P{X=x|O}= 4% PY =y|0}=43 
4: cert: 4° y 1, 
1 1 
As ’ Fs 0, 
Pikes (Ia 47 PIY=yll=47 ‘ 
> ry 3 y 1, 
3 3 0 
Ext e * E{Y|xXj=}¥ 7" 
1 1 
1 ye, is = 1, 
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Example 2. For the RV (X, Y) considered in Examples 4.2.5 and 4.2.7, 


}l—x?  1+x 


1 
ey x)= [ yirixO | x)dy = 55 7 O<x<il, 
and 
y 
E(x yi =f xfx@ | ydx= 5, O<y<l. 
Also, 
| 2 
E(x? |) = | Vode2- Oxy 21 
o yYy 3 
and 


var{X | y} = E(X? | y} —[E(X | y}]" 


2 2 2 
ay oer Aer 
mq 4 = O<y<l. 
Theorem 1. Let Eh(X) exist. Then 
(9) Eh(X) = E{E{h(X) | Y}}- 


Proof. Let (X, Y) be of the discrete type. Then 


E{E{h(X) |Y}}= > | DmavPix =2'= n| P{Y =y} 


y 


> [Drone =x,Y= | 
y x 

= diac) >> PIX =x,Y=y} 
x y 


= Eh(X). 
The proof in the continuous case is similar. 
Theorem | is quite useful in computation of Eh(X) in many applications. 
Example 3. Let X and Y be independent continuous RVs with respective PDF f 
and g and DFs F and G. Then P{X < Y} is of interest in many statistical applica- 


tions. In view of Theorem 1, 


P{X < Y} = Ely cy; = E{E{x<yl¥}} 
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where I, is the indicator function of event A. Now 


E{Itx<yl¥ = y} = Elltx<y | y} 
= E(x <y)) = FY) 


and it follows that 


P(X <Y}= ery) = f F(y)g(y) dy. 


—-0O 


If, in particular, X Z Y, then 


P{X <Y}= il F(y) f(y) dy = 3. 


—oO 


More generally, 


P{X —Y¥ <2} = E{E{Ix_y<z | Y}} = ELF +2)] 


-|/ F(y + z)g(y) dy 


gives the DF of Z = X — Y as computed in corollary to Theorem 4.4.3. 


Example 4. Consider the joint PDF 


f(x,y) = xe ty), x>0, y2>O, and zero otherwise 
of (X, Y). Then 
fx(x) =e™, x >0, and zero otherwise 
and 
1 
=—-———, >0, and zero otherwise. 
fr) Gay? y 


Clearly, EY does not exist but 
1 
EY x)= f yxe “*dy = —. 
0 x 


Theorem 2. If EX2 < 00, then 


(10) var(X) = var(E{X | ¥}) + E(var{X | Y}). 
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Proof. The right-hand side of (10) equals, by definition, 
({E(E(X | ¥})? — [ECE{X | YI?) + E(E{X? | ¥} — (E{X | ¥})) 
= {E(E{X | Y})? — (EX)"} + EX? — E(E{X | Y})? 
= var(X). 
Corollary. If EX < 00, then 
(11) var(X) > var(E{X | Y}) 
with equality if and only if X is a function of Y. 


Equation (11) follows immediately from (10). The equality in (11) holds if and 
only if 


E(var{X | Y}) = E(X — E{X | Y})* =0, 
which holds if and only if with probability 1 
(12) X = E{X | Y}. 


Example 5. Let X;, X2,... be iid RVs and let N be a positive integer-valued RV. 
Let Sy = bear X, and suppose that the X’s and N are independent. Then 


E(Sy) = E{E{Sy | N}}. 
Now 
E{Sy | N =n} = E{S,|N =n) =nEX, 
so that 
E(Sy) = E(NEX)) = (EN)(EX}). 
Again, we have assumed above and below that all indicated expectations exist. Also, 
var(Sw) = var(E{Sy | N}) + E(var{Sw | N}). 
First, 
var(E{Sy | N}) = var(NEX1) = (EX1)* var(N). 
Second, 


var{Sy | N =n} =n var(X}), 
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so 
E(var{Sn | N}) = (EN) var(X}). 
It follows that 
var(Sv) = (EX,)* var(N) + (EN) var(X}). 
PROBLEMS 4.6 


1. Let X be an RV with PDF given by 


1 (x — py 


1 
f@)= =| ae 


|. —0O0<xX<O&, -oOoO<pw<co, ao > 0. 


Find E{X | a < X < b}, where a and b are constants. 


2. (a) Let (X, Y) be jointly distributed with density 


ytx) e904 x,y > 0, 
0, otherwise. 


ron 


Find E{Y | X}. 
(b) Do the same for the joint density 


4 
fay) = Fa Sone es. yee 


0, otherwise. 


3. Let (X, Y) be jointly distributed with bivariate normal density 


1 
f@, y) = ———— === 
20010271 — p2 


1 x—-mi\? . x-miy—p (") 
ep Se pe eet 
ae | 2(1 — p?) ( 0} ) ii o1 02 as 02 


Find E(X | y) and E{Y | x}. (Here, 441, w2 € R, 0), 02 > 0, and |p| < 1.) 
4. Find E(Y — E{Y | X})?. 
§. Show that E(Y — (X))? is minimized by choosing @(X) = E{Y | X}. 
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6. Let X have PMF 


MeA 


PIX=xJ=2S—, x= 01,2... 


and suppose that A is a realization of a RV A with PDF 
fas=e*, A>. 


Find E{e~“ | X = 1}. 

7. Find E(XY) by conditioning on X or Y for the following cases: 
(a) f(x, y) =xe*"4)), x > 0, y > 0, and zero otherwise. 
(b) f(x, y) = 2,0 < y <x < 1, and zero otherwise. 


8. Suppose that X has uniform PDF f(x) = 1,0 < x < 1 and zero otherwise. Let 
Y be chosen from interval (0, X] according to the PDF 


1 
ayy |x)=-, 0<y<x, and zero otherwise 
x 


Find E{Y* | X} and EY* for any fixed constant k > 0. 


4.7 ORDER STATISTICS AND THEIR DISTRIBUTIONS 
Let (X,, X2,... , Xn) be an n-dimensional random variable, and (x1, x2,... , Xn) 
be an n-tuple assumed by (X1, X2,... , X,). Arrange (x1, x2, ... , Xn) in increasing 


order of magnitude so that 


XQ) © %Q) S-++ SX), 


where x(1) = min(x1, x2,... , Xn), X(2) is the second smallest value in x1, x2, ... , Xn, 
and so on, x(n) = max(x1, x2,... , Xn). If any two x;, x; are equal, their order does 
not matter. 


Definition 1. The function Xq) of (X1, X2,... , Xn) that takes on the value x(q) 


in each possible sequence (x1, X2,... , Xn) of values assumed by (X1, X2,... , Xn) 
is known as the kth-order statistic or statistic of order k. {X (1), X), ... , Xp} is 
called the set of order statistics for (X1, X2,..., Xn). 


Example I. Let X,, X2, X3 be three RVs of the discrete type. Also, let X), X3 
take on values 0, i, and X>2 take on values 1, 2, 3. Then the RV (X1, X2, X3) assumes 
these triplets of values: (0, 1,0), (0,2,0), (0,3,0), (0,1, 1), (0,2, 1), (, 3, 1), 
(1, 1,0), (1, 2,0), (1, 3, 0), (1, 1,1), 1,2, 1), 0, 3, 1; Xqy takes on values 0, 1; 
X ay takes on values 0, 1; and X (3) takes on values 1, 2, 3. 
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Theorem 1. Let (X1, X2,... , Xn) be an n-dimensional RV. Let Xq,1< k < 
n, be the statistic of order k. Then Xx) is also an RV. 


Statistical considerations such as sufficiency, completeness, invariance, and ancil- 
larity (Chapter 8) lead to the consideration of order statistics in problems of statistical 
inference. Order statistics are particularly useful in nonparametric statistics (Chap- 
ter 13), where, for example, many test procedures are based on ranks of observations. 
Many of these methods require the distribution of the ordered observations, which 
we now study. 

In the following we assume that X1, X2,... , X, are iid RVs. In the discrete case 
there is no magic formula to compute the distribution of any X,;) or any of the joint 
distributions. A direct computation is the best course of action. 


Example 2. Suppose that X,,’s are iid with geometric PMF 
pPe=P(X=kh=pq*'!, k=1,2,...,0<p<1, q=1-p. 
Then for any integers x > l andr > 1, 


P{X(y) = x} = P(X <x} — P{X~ <x — 1}. 


Now 
P{X(r) < x} = P{at least r of X’s are < x} 
=)> (Olizes < x)}[P(% > xr 
i=] 

and 

co 

P(Xi>x)=)> pq! =(1— py. 

k=x 

It follows that 


P{X(r) — x} = y (jee {a"“U = qv} ales gy ; 


am I 
x =1,2,....In particular, letn = 7 = 2. Then 
P{X@ =x} = pg*\(pg? 1 +2—-29*"'), x21. 
Also, for integers x, y > 1 we have 


P{Xqy =x, Xa) — Xa) =y} = P(X) =x, XQ=x4+y} 
= P{X; =x,X2=x+y}+ P(X, =x+ y,X2 =x} 


ORDER STATISTICS AND THEIR DISTRIBUTIONS 173 
= 2pq*. pg?! 
= 2pq™*~? . pq” = P{Xqa) = x}P(Xq = y} 
and 


P{Xqay = 1, XQ) — Xa) = O} = P(X = XqQ= Y= p?. 


It follows that Xi) and X(2) — Xj) are independent RVs and, moreover, that X(2) — 
X 1) has a geometric distribution. 


In the following we assume that X;, X2,..., Xn are iid RVs of the continu- 
ous type with PDF f. Let {X(1), X(2),-.. , Xqy} be the set of order statistics for 
X , X2,... , Xn. Since the X; are all continuous type RVs, it follows with probabil- 
ity | that 

Xa) < XQ) <--- < Xq. 


Theorem 2. The joint PDF of (Xi), Xq, ... , Xq@)) is given by 


n! Te f@@), XQ) < XQ) < ++ < XM), 
1 XL), X02). 022 > X(n)) = foi 
() 8G, 4@) @) {; otherwise. 


Proof. The transformation from (X1, X2,... , Xn) to (Xa), XQ), .-. , Xqm) is 
not one-to-one. In fact, there are n! possible arrangements of x;, x2,... , Xn in in- 
creasing order of magnitude. Thus there are n! inverses to the transformation. For 
example, one of the m! permutations might be 


X4 << X1 < Xn] < X38 <+++ << Xy < x2. 
Then the corresponding inverse is 
X4 =X), XP = XQ), Xn—-1 = X3), XZ =X4), «++, An = Xn-1), X2 = Xn). 
The Jacobian of this transformation is the determinant of an n x n identity matrix 


with rows rearranged, since each x) equals one and only one of x1, x2,... , Xn. 
Therefore, J = +1, and 


n 
8(X (2), X(n)» X(4)» XC)» + XB) X21) = iP] f@@). X(t) < X02) < +++ < xq). 


i=l 


The same expression holds for each of the n! arrangements. 
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It follows (see Remark 4.4.2) that 


n 
g(x. 2Q.-- xm) = DY, [] feo) 


-alln i=l 
inverses 


_ {et faa)f@a)---f@m) — ifxay <x@--+ <x@), 
0 otherwise. 


Example 3. Let X1, X2, X3, X4 be iid RVs with PDF f. The joint PDF of 
Xi), X(2), XQ)» X is 


4! fod fO2f Oa) f (ya), yi < y2< y3 < 4, 


fi ’ +4) = 
BCV1, Y2, ¥3s Ya) * otherwise. 


Let us compute the marginal PDF of X (2). We have 
82(y2) = 4! I fODF 02) F (3) f (ya) dyy dy3 dya 
32 oo joe) 
= 4! f(y) [- i Fou) ay.| fOafOvdys dy; 
= 3 
= 4! f(y) i if U- Fonlfondys| foudy 


% [1— F(y2)? 
= 4! f(y) i CaSO" fonay 


F 
= 4! f(y2 La FO 9), 2 € RB. 


The procedure for computing the marginal PDF of X(,), the rth-order statistic of 
X1, X2,... , Xn, is similar. The following theorem summarizes the result. 


Theorem 3. The marginal PDF of X,(,) is given by 


=. nl r-l = n—-r 
(2) 8rOr) = G-Dinen! ry lf Ow] [1 — Fy) fOr), 


where F is the common DF of X,, X2,... , Xn- 


Proof. 


yr Yr— 2 co £m 
&r(r) = niton f bes fe / [. of [[ fon dyn + dyr4+1 
Yr Yr+l Yn— 


li¢r 
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-dy,---dy,-] 


a PS) ia 
= nt foe f -[ Thurooan 


—ry! 


(- FO) (FOr)! 


SE One ay (-1)! 


4 


as asserted. 
We now compute the joint PDF of X(j) and X@,1 <j <k <n. 


Theorem 4. The joint PDF of X,;) and X(q) is given by 


n! : 
ee ply, 
ee fae i 
BRON =) _ Fy yI- — FOP FO)FOO) if yy < Yes 
0 otherwise. 
(3) 
Proof. 


yj y2 Yk Dk oo oO 
guivj) = f of i, ak Led PGDLAIOD 
—0O 00 j k—-2 ¥ Vk Yn-1 


- dyn - ++ dyyay dyg—1-++dyj4idyi---dyj-1 


yj y2 Vk 1—F n—-k 
=n fi . op [oo fC eo ronson soo 
yj Yk~ 


- dyp—1 +++ dyjai dy a 


_ _,U- Foor 2 [F (ye) — Fy) 
ad PG 2 sf k-j—)! 
Pere are -+dyj-1 
= nae ee _ n—k a yyk—-jo-l 
ECT en EO FOM 
[FO,)¥7! 
: FOWFOD Gay Yj < Yk 
as asserted. 
In a similar manner we can show that the joint PDF of X(j,),-..,X(j),1< ji < 


Ja<-++ < je <n, 1 <k <n, ts given by 
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ni 
- F101) f DIF G2) 
~ F(y) 1?! fa) fh = FOR fx) 


Sit. jas- oie > y2.-. + Yk) = 


for y) < y2 <-+-: < yg, and = 0 otherwise. 


Example 4. Let Xi, X2,... , X, be iid RVs with common PDF 


1 if0 <x <I, 
0 otherwise. 


ro=| 


Then 


n! 


pelEnesa oe ere 
0 


otherwise. 


The joint distribution of Xj) and X(,) is given by 


n} j-l k-j-1 —k 
Gee j- nea o es ee 
Bik (Yj Yk) = O< yj) <y% <1, 
0 otherwise, 


where 1 <j <k <n. 
The joint PDF of X(1) and Xp) is given by 


BinQ1s Yn) =n(n—-VDOn—y)"™?, OK<y <n <1 


and that of the range Rp = X(n) — X14) by 


nin—l)w"2(1-—w), O<w<1, 
0, otherwise. 


gr, (w) = | 


Example 5. Let X 1), X(2), X (3) be the order statistics of iid RVs X;, X2, X3 with 
common PDF 


Be-*F , x>0 
0, otherwise 


ro = | (B > 0). 
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Let ¥; = X3) — X(2) and ¥2 = X2). We show that Y; and Y2 are independent. The 
joint PDF of X 2) and X(3) is given by 


3! 
gu(x.y) = fool ~ EOP Be OX perky), aay, 
0, otherwise. 


The PDF of (Yj, Y2) is 


fOn1, y2) = 3! B21 a e792) e~ Bye (ni ty2)B 


_ {3! Be~*Bx2(4 — e~ BY2)] (Be A), 0 < yj < 00,0 < yp < &, 
— 0, otherwise. 


It follows that Y; and ¥2 are independent. 
Finally, we consider the moments: namely, the means, variances, and covariances 
of order statistics. Suppose that X;, X2,..., X, are tid RVs with common DF F. 


Let g be a Borel function on R such that E[g(X)| < co, where X has DF F. Then 
fori <r<n, 


r—irq n—r 
if soe nee “aon {1 — FQ)" f(x) dx 


< ( m 5 f Ie@xylf(ydx O<F <1) 
r—I) Joo 


< 00 


and we write 
Lo} 
Eg(Xwy) = i e(y)er(y) dy 
—-ooO 


forr = 1,2,...,n. The converse also holds. Suppose that E|g(X(,))| < 00 for 
r=1,2,...,n. Then 


—] [o.e) 
n(” 7 ai Ig) F" GOEL — FQ)" f(x) dx < 00 


r 


forr = 1,2,... ,n and hence 


io) n aa | 
: fae ~ i) Freon = Foon lietanis(ayas 


= nf \g(x)|f (x) dx < oo. 
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Moreover, it also follows that 


Y= Eg(X ey) = nEg(X). 


r=] 


As a consequence of the remarks above, we note that if E|g(X())| = 00 for some r, 
1 <r <n, then E|g(X)| = oo, and conversely, if E|g(X)| = 00, then E|g|Xi7))| = 
oo forsomer,1 <r <n. 


Example 6. Let X;, X2,... , Xn be iid with Pareto PDF f(x) = 1/x?, ifx > 1, 
and = 0 otherwise. 
Then EX = oo. Now forl1 <r <n, 


n~t\ £” 1\! 1 dx 
exn=n("~ 1) [ z(i~=) xAaF 
| 1 
=n(" )f ytd — yy" dy. 
r—1 0 


Since the integral on the right side converges for 1 < r < n — { and diverges for 
r>n— 1, wesee that EX) = 00 forr =n. 


PROBLEMS 4.7 


1. Let X(1), X 2), .. -X(n be the set of order statistics of independent RVs X1, X2, 
... , Xp, With common PDF 


—xB : 
ee if x > 0, 


otherwise. 


(a) Show that X(-) and X(s) — X(v) are independent for any s > r. 
(b) Find the PDF of X41) — Xq. 


(c) Let Z) = nXqy,Z2 = (n — 1)(XQy — Xa), Z3 = M — 2)(Ke) — 
Xy),--+ + Zn = (Xn) — Xen-1)- Show that (Z;, Z2,... , Zn) and (X1, X2, 
..., Xp) are identically distributed. 


2. Let X1, X2,... , Xn be iid from PMF 


Find the marginal distributions of X(1), Xa), and their joint PMF. 
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3. Let X}, X2,... , X, be iid with a DF 


y® fO<y <1, 
0 otherwise, a> 0. 


ro=| 


Show that X(j)/X(m), i = 1,2,...,n —1, and Xm) are independent. 
4, Let X;,X2,..., Xn be iid RVs with common Pareto DF f(x) = aa%/x*t!, 
x > o where a > 0, o > 0. Show that: 
(a) Xqy and (X(2)/Xq),--- , Xi /Xqy) are independent. 
(b) X(1) has Pareto (co, na) distribution. 
(c) yi In(X(j)/X cy) has PDF 


[@) = aor x>0. 


5. Let X1, X2,... , Xn be iid nonnegative RVs of the continuous type. If E|X| < 
oo, show that E|X(r)| < co. Write Mz = Xn) = max(X1, X2,... , X,). Show 
that 


le @] 
EM, = EMn-} +f F"lo)[1 — F(x)] dx, n=2,3,.... 
0 


Find EM, in each of the following cases: 
(a) X; have the common DF 


F(x) =1-eF, x20. 
(b) X; have the common DF 
F(x)=x, O<x <I. 


6. Let X(1), X(2),.-- » Xm be the order statistics of n independent RVs X 1, X2, 
... , X, with common PDF f(x) = 1 if 0 < x < 1, and = 0 otherwise. Show 
that ¥) = X(1)/X 2), Yo = X(2y/XQ),---s Yn—t) = Xen—1)/X cw), and Y, = X(n) 
are independent. Find the PDFs of Y;, ¥2,... , Yn. 


7. For the PDF in Problem 4, find EX(,). 


8. An urn contains N identical marbles numbered 1 through N. From the um n 
marbles are drawn, and let X(,) be the largest number drawn. Show that P(X ny = 


b= (i) / (kannst N, and EX(n) =n(N +1)/(n +1). 


n-l 


CHAPTER 5 
Some Special Distributions 


5.1 INTRODUCTION 


In preceding chapters we studied probability distributions in general. In this chapter 
we study some commonly occurring probability distributions and investigate their 
basic properties. The results of this chapter will be of considerable use in theoretical 
as well as practical applications. We begin with some discrete distributions in Sec- 
tion 5.2 and follow with some continuous models in Section 5.3. Section 5.4 deals 
with bivariate and multivariate normal distributions, and in Section 5.5 we discuss 
the exponential family of distributions. 


5.2 SOME DISCRETE DISTRIBUTIONS 


In this section we study some well-known univariate and multivariate discrete distri- 
butions and describe their important properties. 


5.2.1 Degenerate Distribution 


The simplest distribution is that of an RV X degenerate at point k, that is, P(X = 
k} = 1 and = 0 elsewhere. If we define 


(a) 68) 0 if x < 0, 
x)= 
1 ifx > 0, 


the DF of the RV X is e(x — k). Clearly, EX! = k',1 = 1,2,..., and M(t) = e*. 
In particular, var(X) = 0. This property characterizes a degenerate RV. As we shall 
see, the degenerate RV plays an important role in the study of limit theorems. 


5.2.2 Two-Point Distribution 


We say that an RV X has a two-point distribution if it takes two values, x; and x2, 
with probabilities 


P{(X=x)}=p and P{X =x}=1-—p, O<p<l. 
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We may write 
(2) X = xy Uyxex,} + X21 x=x9}, 


where I, is the indicator function of A. The DF of X is given by 


(3) F(x) = pe(x — x1) + (1 — p)e(x — x2). 
Also, 

(4) EX* = pxt +(i—p)xk, kk =1,2,..., 
and 

(5) M(t) = pei + (1 — p)e’™? for all t. 

In particular, 

(6) EX = px, + (1 — p)x2 

and 

(7) var(X) = p(1 — p(x — x2)". 


If x; = 1, x2 = 0, we get the important Bernoulli RV: 
(8) P{X¥=1)=p and P{xX =0}=1-p, O<p<tl. 
For a Bernoulli RV X with parameter p, we write X ~ b(i, p) and have 
(9) EX =p, var(X) = p(i—p), and M(t)=1+p(e'—1), all t. 


Bernoulli RVs occur in practice, for example, in coin-tossing experiments. Sup- 
pose that P{H} = p,0 < p < 1, and P{T} = 1 — p. Define RV X so that X(H) = I 
and X(T) = 0. Then P{X = 1} = p and P{X = 0} = 1 — p. Each repetition of 
the experiment will be called a trial. More generally, any nontrivial experiment can 
be dichotomized to yield a Bernoulli model. Let (9%, S, P) be the sample space of 
an experiment, and let A € S with P(A) = p > 0. Then P(A‘) = 1 — p. Each 
performance of the experiment is a Bernoulli trial. It will be convenient to call the 
occurrence of event A a success and the occurrence of A‘ a failure. 


Example 1 (Sabharwal [95]). In a sequence of n Bernoulli trials with constant 
probability p of success (5), and 1 — p of failure (F), let Y,, denote the number 
of times the combination SF occurs. To find EY, and var(Y,,), Jet X; represent the 
event that occurs on the ith trial, and define RVs 


t SX 8. Xy SF 


Xj, Xj = 
PMG Kir) 0 otherwise 


@=1,2,...,n—1). 
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Then 
n—-] 
Yn = D> f(Xi, Xin) 
i=] 
and 
EY, = (n — 1)p( — p). 
Also, 
n—1 
EY, =E b PX, Xia +E b Yo Ki, Xi F (Kj, xe | 
i=t ifj 
= (n — l)p(1 — p) + (n— 2) — 3) pC — p)’, 
so that 
var(Y,) = p(l — p)[n — 1+ pl — p)G — 3n)]. 
If p = 4, then 
n—-1 n+i1 
EY, => aS and var(Y,) = as 
§.2.3. Uniform Distribution on 7 Points 
X is said to have a uniform distribution on n points {x1, x2, ... , Xn} if its PMF is of 
the form 
1 
(10) P{X =xj}=-, i= 1,2,... 50: 
n 


Thus we may write 


n 


n 
1 
X= )oxlx=x) and F(x) = = >> el - 4), 


i=l i=! 


1 n 
(11) EX =—) xi, 
i=l 
1 n 
(12) EX'=—) ox, $= 1,2). 
n i=] 
and 


n 2 n 
(13) var(X) = ~ yx? = ( >) = ~ ~ x) 
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if we write x = )77_, xi/n. Also, 


(14) M(t) = oe for all t. 

If, in particular, xj = i,i = 1,2,...,n, 

(15) Ex ="S* Bx? = SEO, 
and 

(16) var(X) = s 


Example 2. A box contains tickets numbered 1 to N. Let X be the largest number 
drawn in n random drawings with replacement. 
Then P{X < k} = (k/N)", so that 
P{X =k} = P{X <k}— P{X <k-} 
_[k i k—-1\" 
An N : 
Also, 


N 
EX a N ue oe (k = 1)"t! ae (k x4 1)"] 
1 


N 
=N” a = Yk es "| . 
1 


5.2.4 Binomial Distribution 


We say that X has a binomial distribution with parameter p if its PMF is given by 
(17) pe = P{X =k} = (j eta — py *, k=0,1,2,...,m, 0< psi. 


Since )’y-o Pk = [p+(1—p)]" = 1, the p,’s indeed define a PME. If X has PMF 
(17), we will write X ~ b(n, p). This is consistent with the notation for a Bernoulli 
RV. We have 


F(x) = » (7) p*(1— py" *e@ — bk). 


k=0 
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In Example 3.2.5 we showed that 


(18) EX =np, 

(19) EX? =n(n—1)p* +np, 
and 

(20) var(X) = np(1 — p) = npq, 


where q = 1 — p. Also, 


(21) M(t) = et (7) ota ~ py" 
k=0 k 


=(q+ pe)" _— forallt. 
The PGF of X ~ b(n, p) is given by P(s) = {1 — p(1 —s)}", |s| < 1. 
Binomial distribution can also be considered as the distribution of the sum of n 
independent, identically distributed b(1, p) random variables. If we toss a coin, with 


constant probability p of heads and 1 — p of tails, n times, the distribution of the 
number of heads is given by (17). Alternatively, if we write 


1 if kth toss results in a head, 
X, = . 
0 otherwise, 


the number of heads in 7 trials is the sum S, = X1 + Xo +---+ Xn. Also 
P{X,=W=p and P{X,=O}=1—p, k=1,2,...,n. 


Thus 


n 
ESn = )) EX; =np, 
] 


var(Sn) = )_ var(Xi) = np(1 — p), 
1 
and 


M(t) = Il Ee'*i 


i=l 
= (q + pe’)”. 


Theorem 1. Let X;(é = 1,2,... ,k) be independent RVs with X; ~ b(n;, p). 
Then S; = san X; has a b(n, + n2+---+ nx, p) distribution. 
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Corollary. If X;(i = 1,2,... , 4) are iid RVs with common PMF b(n, p), then 
S; has a b(nk, p) distribution. 


Actually, the additive property described in Theorem | characterizes the binomial 
distribution in the following sense. Let X and Y be two independent, nonnegative, 
finite integer-valued RVs and let Z = X+Y. Then Z isa binomial RV with parameter 
p if and only if X and Y are binomial RVs with the same parameter p. The “only if” 
part is due to Shanbhag and Basawa [101] and will not be proved here. 


Example 3. A fair die is rolled n times. The probability of obtaining exactly one 
6 is n(4)(2)""|, the probability of obtaining no 6 is (2)", and the probability of 
obtaining at least one 6 is 1 — ( 3)". 

The number of trials needed for the probability of at least one 6 to be > 5 is given 
by the smallest integer m such that 


so that 


Example 4. Here r balls are distributed in 1 cells so that each of n’ possible 
arrangements has probability n~’. We are interested in the probability p,; that a 
specified cell has exactly k balls (k = 0, 1, 2,... ,7). Then the distribution of each 
ball may be considered as a trial. A success results if the ball goes to the specified 
cell (with probability 1/n); otherwise, the trial results in a failure (with probability 
1 — 1/n). Let X denote the number of successes in r trials. Then 


1 k 1 r—k 
p= Pix=ti= (7) (*) (.-5) BO By soci: 
k n n 


5.2.5 Negative Binomial Distribution (Pascal or Waiting-Time Distribution) 


Let (2, S, P) be a probability space of a given statistical experiment, and let A €¢ S 
with P(A) = p. On any performance of the experiment, if A happens we call it a 
success, otherwise a failure. Consider a succession of trials of this experiment, and 
let us compute the probability of observing exactly r successes, where r > 1 is a 
fixed integer. If X denotes the number of failures that precede the rth success, X +r 
is the total number of replications needed to produce r successes. This will happen 
if and only if the last trial results in a success and among the previous (r + X — 1) 
trials there are exactly X failures. It follows by independence that 


(22) Pix=x=(" eer py, #OAZD: biog 
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Rewriting (22) in the form 


(23) P{X=x}= (Sera, $50,12...> gelep, 
we see that 
“(-r 
(24) » ( Jeo =(1-q)"’ =p. 
x=0 a 
It follows that 
oo 
a P(X =x}=1. 
=0 


Definition 1. For a fixed positive integer r > 1 and0 < p < 1, an RV with PMF 
given by (22) is said to have a negative binomial distribution. We use the notation 
X ~ NBC(r; p) to denote that X has a negative binomial distribution. 


We may write 
= So (k+r—1 
X=) oxkxey and F(x) = e( : ora — p)ke(x — k). 
x=0 k=0 
For the MGF of X we have 
= -1 
(25) M(t) = > (‘ ne era = p)e* 
x=0 + 
= x+r-1 
=p’ Yeae( ) (q=1-p) 
x=0 * 


= p’(1—qe')” for ge’ <1. 
The PGF is given by P(s) = p’(1 — sq)~’, |s| < 1. Also, 


a —i 
(26) ex=)ox(*7 ora’ 
x=0 


CS fx+r 
rp’ Pa ( ja" 
x=0 x 


cera rq 
=rp'g—qy"'=—. 
p 


i 


Similarly, we can show that 


rq 
(27) var(X) = —. 
p 
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If, however, we are interested in the distribution of the number of trials required 
to get r successes, we have, writing Y = X +r, 


—1 
(28) py=y= (71 )ea-p, y=rnrt+l,..., 


EY = EX 47 =, 
P 


(29) rq 
var(Y) = var(X) = > 
Pp 
and 
(30) My (t) = (pe')' (1 — ge’) ” for ge’ < 1. 


Let X be a b(n, p) RV, and let Y be the RV defined in (28). If there are r or more 
successes in the first n trials, at most n trials were required to obtain the first r of 
these successes. We have 


(31) P{X >r}= P{Y <n} 
and also 
(32) P{X <r} = P{Y > n}. 


In the special case when r = 1, the distribution of X in (22) is given by 
(33) P{X =x} = pq’, x=0,1,2,.... 


An RV X with PMF (33) is said to have a geometric distribution. Clearly, for the 
geometric distribution, we have 


(34) M(t) = p—ge'y!, ExX=4, and var(x) = 4. 
Pp P 


Example 5 (Banach’s Matchbox Problem). A mathematician carries one 
matchbox each in his right and left pockets. When he wants a match, he selects 
the left pocket with probability p and the right pocket with probability 1 — p. Sup- 
pose that initially each box contains N matches. Consider the moment when the 
mathematician discovers that a box is empty. At that time the other box may contain 
0,1,2...,N matches. Let us identify success with the choice of the left pocket. 
The left-pocket box will be empty at the moment when the right-pocket box contains 
exactly r matches if and only if exactly N —r failures precede the (N + 1)st success. 
A similar argument applies to the right pocket, and we have 


Pr = probability that the mathematician discovers a box empty while 
the other contains r matches 


— (2N—-P\ war wer, (PN -7\ wat ner 
=(y oP BORN ppc yen Be 
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Example 6. A fair die is rolled repeatedly. Let us compute the probability of event 
A that a 2 will show up before a 5. Let A; be the event that a 2 shows up on the jth 
trial (j = 1,2,...) for the first time, and a 5 does not show up on the previous j — 1 
trials. Then PA = 2, PAj, where PA; = 2(3)/~'. It follows that 


SpyAniel 4 

P(A) = -{- =-. 

oo os 6 (Z) 2 
j=l 

Similarly, the probability that a 2 will show up before a 5 or a 6 is i and so on. 


Theorem 2. Let X;, X2,... , Xx be independent N B(r;; p) RVs, i = 1,2,...,k, 
respectively. Then 5S; = eae X; is distributed as NB(r} + ro +--- +7; p). 


Corollary. If X;, X2,... , X; are iid geometric RVs, then S; isan NB(k; p) RV. 
Theorem 3. Let X and Y be independent RVs with PMFs NB(r1; p) and 


N B(r2; p), respectively. Then the conditional PMF of X, given X + Y = tf, is 
expressed by 


oe yee. 
PUCS SLY Se 


Gane 
t 


If, in particular, r) = r2 = 1, the conditional distribution is uniform on t +- 1 points. 


Proof. By Theorem 2, X + Y isan NB(ri +12; p) RV. Thus 


P(X =x, Y=1—x} 
P{X+Y=t} 


~1 t~ -1 
(aa ora y( rous pra py 


t 
-1 
oe ort = py 


pe) asia 
et Se. peo. 
eas 
t 


If r; = rz = 1, that is, if X and Y are independent geometric RVs, then 


lt 


P{X =x|X+Y=8 


| — 


(35)P(X =x1X+¥ =1)=- pO. 6 VEO 


1’ 


+ 
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Theorem 4 (Chatterji [12]). Let X and Y be iid RVs, and let 
P{X =k) = p, > 0, k=0,1,2,.... 


If 


I 
(36) i eat A aa i Ae ea rary t>0, 


then X and Y are geometric RVs. 


Proof. We have 


Pt Po 1 
(37) P(X =t|IX+Y¥=o=— Ho - — 
De-o PrPrke ot +4 
and 
bs 1 
(38) P(X =1-1|X+¥ =.= — PO! =——. 
ieo PePr-ke t+ 


It follows that 
Pt = PL 
Pt-1 PO 


and by iteration py = (p1/ po)! po. Since )-r25 p: = 1, we must have p1/po < 1. 
Moreover, 


] 
T= (pi/po)’ 


so that pi /po = 1 — po, and the proof is complete. 


1=p 


Theorem 5. If X has a geometric distribution, then for any two nonnegative in- 
tegers m and n, 


(39) P{X >m+n|X > m} = P{X > n}. 
The proof is left as an exercise. 


Remark 1. Theorem 5 says that the geometric distribution has no memory; that 
is, the information of no successes in m trials is forgotten in subsequent calculations. 


The converse of Theorem 5 is also true. 


Theorem 6. Let X be a nonnegative integer-valued RV satisfying 


P{X >m+1|X > mj} = P{X > 1}. 
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for any nonnegative integer m. Then X must have a geometric distribution. 


Proof. Let the PMF of X be written as 


Pix=h=pe- £S01ee 


Then 
foe] 
PIX >n} =) | pe 
k=n 
and 
oO 
P{X >m}= x Pk=4m,_ Say, 
m+! 
P{X 1 
P{X > m} dm 
Thus 
m+ = 4m; 
where qo = P{X > 0} = pj + po +--- = 1 — po. It follows that g, = (1 — po)*t!, 
and hence px = qx—1 — qe = (A — po)* po, as asserted. 
Theorem 7. Let X;, X2,... , X, be independent geometric RVs with parameters 
Pi, P2,--- 5 Pn, Tespectively. Then X(1) = min(X1, X2,... , Xn) is also a geometric 
RV with parameter 


n 
p=1-[]a- pi. 
i=1 
The proof is left as an exercise. 


Corollary. lid RVs X1, X2,..., Xp, are NB(1; p) if and only if Xq1) is a geo- 
metric RV with parameter 1 — (1 — p)”. 


Proof. The necessity follows from Theorem 7. For the sufficiency part of the 
proof, let 


P{Xay) <k}=1- P{Xqay) > kh} = 1-1 — pyr. 
But 


P{Xay <k}) = 1 — P{X1 > k, Xa >k,..., Xn > ky 
=1-[1~ FI), 
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where F is the common DF of X1, X2,... , Xn. It follows that 
1— Fk) = (1— p)**, 


so that P{X; > k} = (1 — p)**!, which completes the proof. 


5.2.6 Hypergeometric Distribution 


A box contains N marbles. Of these, M are drawn at random, marked, and returned 
to the box. The contents of the box are then thoroughly mixed. Next, n marbles are 
drawn at random from the box, and the marked marbles are counted. If X denotes 
the number of marked marbles, then 


“1 7M\ (N — 
wm ren) (CN) 


Since x cannot exceed M or n, we must have 
(41) x <min(M,n). 
Also, x > Oand N — M > n — x, so that 


(42) x > max(0,M+n-—N). 


2 (Ga) = C2") 


for arbitrary numbers a, b and positive integer 7. It follows that 


prvede(() EG) (as) 


Definition 2. An RV X with PMF given by (40) is called a hypergeometric RV. 


Note that 


It is easy to check that 


n 
(43) EX ="—M, 
»_M(M-1)— nM 
(44) EX? = nn 1) + 
and 
(45) aoe = Ween, 


N2(N — 1) 
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Example 7. A lot consisting of 50 bulbs is inspected by taking at random 10 
bulbs and testing them. If the number of defective bulbs is at most 1, the lot is ac- 
cepted; otherwise, it is rejected. If there are, in fact, 10 defective bulbs in the lot, the 
probability of accepting the lot is 


()() , (ro) 
1 9 10 
7 + 750\ = 3487 
10 10 
Example 8. Suppose that an urn contains b white and c black balls,b +c = N. 
A ball is drawn at random, and before drawing the next ball, s + 1 balls of the same 
color are added to the urn. The procedure is repeated n times. Let X be the number 


of white balls drawn inn draws, X = 0, 1,2,... ,. We shall find the PMF of X. 
First note that the probability of drawing k white balls in successive draws is 


b b+s b+2s b+(k—I1)s 


NN4+sN42s N+4+(k—-—Ds’ 


and the probability of drawing k white balls in the first k draws and then n — k black 
balls in the next n — k draws is 


(46) ips OE oA a nce SO 
NN+s N+(k-DsN+ksN+(k + 1)s 

c+(m—k-—I1)s 

N+@—Ds | 


Here p,; also gives the probability of drawing k white and n — k black balls in any 
given order. It follows that 


(47) P{X=k}= (). 


An RV X with PMF given by (47) is said to have a Polya distribution. Let us write 
Np=b, NUi-—p)=c, and Na=s. 
Then with q = I — p, we have 


PIX =k) = (j) ene a el 


k 1(1+a@)---[1+@— Da] 


Let us take s = —1. This means that the ball drawn at each draw is not replaced in 
the urn before drawing the next ball. In this case a = —1/N, and we have 
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_ py (*\ NPP =D INp ~ k= Dee = 1) le- @ =k DI 
pix =k) = (7) N(N —1)---[N-(@—D] 


EC") 


(48) = a * 
n 
which is a hypergeometric distribution. Here 
(49) max(0,n — Nq) <k < min(n, Np). 


Theorem 8. Let X and Y be independent RVs with PMFs b(m, p) and b(n, p), 
respectively. Then the conditional distribution of X, given X + Y, is hypergeometric. 


5.2.7 Negative Hypergeometric Distribution 


Consider the model of Section 5.2.6. A box contains N marbles; M of these are 
marked (or say defective) and N — M are unmarked. A sample of size n is taken, and 
let X denote the number of defective marbles in the sample. If the sample is drawn 
without replacement, we saw that X has a hypergeometric distribution with PMF 
(40). If, on the other hand, the sample is drawn with replacement, then X ~ b(n, p) 
where p = M/N. : 

Let Y denote the number of draws needed to draw the rth defective marble. If 
the draws are made with replacement, then Y has the negative binomial distribution 
given in (22) with p = M/N. What if the draws are made without replacement? In 
that case in order that the kth draw (k > r) be the rth defective marble drawn, the 
kth draw must produce a defective marble, whereas the previous k — 1 draws must 
produce r — 1 defectives. It follows that 


a) ieee 


@ “N—k+1 
RHA 


fork =r,r+1,...,N. Rewriting, we see that 


oa, 
(50) P(Y=h= (; 7 aera 
(1) 


An RV Y with PMF (50) is said to have a negative hypergeometric distribution. 
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It is easy to see that 


N+1 EY(Y +1)= MOAI DWN 2) 


EY= : 
"M+1 (M + 1)(M +2) 


and 


r(N — M)(N + 1)(M+ 1-1) 
(M + 1)?(M + 2) 


Also, ifr/N —> Oandk/N — Oas N — on, then 


Qos Oes: 


which is (22). 


var(Y) = 


5.2.8 Poisson Distribution 
Definition 3. An RV X is said to be a Poisson RV with parameter A > 0 if its 
PMF is given by 


e*ak 
kt? 


(51) P{X =k}= k=0,1,2,.... 


We first check to see that (51) indeed defines a PMF. We have 


= awe AoA 
Do P{X =k} =e eat e~=1, 
k=0 k=0 


If X has the PMF given by (51), we will write X ~ P(A). Clearly, 


and 


The mean and the variance are given by (see Problem 3.2.9) 
(52) EX=), EX? =,427, 
and 


(53) var(X) =A. 
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The MGF of X is given by (see Example 3.3.7) 
(54) Ee'* = expfa(e' — 1)] 
and the PGF by P(s) = e*4-), |s} < 1. 


Theorem 9. Let X;, X2,... , Xn be independent Poisson RVs with X, ~ P(Ax), 
k=1,2,...,n. Then S, = Xj + X2+---+ Xn isa P(A, +Ad +--+: + An) RV. 


The converse of Theorem 9 is also true. Indeed, Raikov [82] showed that if 
X1, X2,... , X» are independent and S, = }~7_, X; has a Poisson distribution, each 
of the RVs X1, X2,... , Xn has a Poisson distribution. 


Example 9. The number of female insects in a given region follows a Poisson 
distribution with mean 1. The number of eggs laid by each insect is a P(jz) RV. We 
are interested in the probability distribution of the number of eggs in the region. 

Let F be the number of female insects in the given region. Then 


-Ayf 
PIF = f)="—., FSO she 


Let Y be the number of eggs laid by each insect. Then 


P(Y=y, F=fj=P{F=fjP{Y =yiF = f} 
_ eM (fuel 


f! y! 


Aiy 2 (neh) F FY 
Bea 
yl 4 f! 


The MGF of Y is given by 


CO yf o~k & pyt y 
Mi) = So * 8 UE us 
fa FE ya 7! 


= — expi fice’ — 1)] 
oii 


00 prAche’-byf 


po OFF 


=e exp[ae#"~)}, 
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Theorem 10. Let X and Y be independent RVs with PMFs P(A;) and P(A2), 
respectively. Then the conditional distribution of X, given X + Y, is binomial. 


Proof. For nonnegative integers m and n,m < n, we have 
P{X =m, Y=n-—m} 
P{X¥+Y¥ =n} 
eat /mije2 (a5 /(n — m)!) 
e~Fitha(Ay + A2)"/n! 


= (") Mas 
m/] (Ay +2)" 


Ae) Ge) Peg) 
Am) \Ay +2 Ay +A2 , 


m=0,1,2,...,n, 


P{X =m|X4+Y =n} 


and the proof is complete. 


Remark 2. The converse of this result is also true in the following sense. If X 
and Y are independent nonnegative integer-valued RVs such that P{X = k} > 0, 
P{Y = k} > 0, fork = 0,1,2,..., and the conditional distribution of X, given 
X + Y, is binomial, both X and Y are Poisson. This result is due to Chatterji [12]. 
For the proof, see Problem 13. 


Theorem 11. If X ~ P(A) and the conditional distribution of Y, given X = x, is 
b(x, p), then Y is a P(Ap) RV. 


Example 10 (Lamperti and Kruskal [58]). Let N be a nonnegative integer-valued 
RV. Independent of each other, N balls are placed either in urn A with probability p 
(0 < p < 1) orin urn B with probability 1 — p, resulting in Ny, balls in urn A and 
Ng = N—N4j balls in urn B. We will show that the RVs N4 and Ng are independent 
if and only if N has a Poisson distribution. We have 


b 
P{N,a =a and Ng =bIN =a+b)=(°* ora ~ py 
where a, b are integers > 0. Thus 
a+b ab 
P{Na =a, Nz = b} = pq P{N =n}, g=1-p, n=a+t+b. 
a 


If N has a Poisson (A) distribution, then 


(a+b)! , ,e haste 


P{N, =a, Np =b}= _—_—_—— 
aa ed ak earl ak eer 
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x (“x*) (Grew) 
a! bi ‘ 


so that N4 and Nz are independent. 
Conversely, if N4 and Nz are independent, then 


P{N =n}jn!= f(ajgtb) 


for some functions f and g. Clearly, f(0) 4 0, g(0) # O because P{N4 = 0, Ng = 
0} > 0. Thus there is a function A such that h(a +b) = f(a)g(b) for all nonnegative 
integers a, b. It follows that 

hQ) = f(g) = fO)g(1), 

h(2) = f(2)g(0) = f(Dg() = fOg(2), 


and so on. By induction, 


a(l) op En 
— fa) | 22 b) = p(t) | = : 
f(a) = f( | ol ; g(b) = g(t) FO 


We may write, for some a1, a2, A, 
f@=aje, e(b) = aye, 


and 
e (a+b) 
P{N=n}= O02 BE 
so that N is a Poisson RV. 


5.2.9 Multinomial Distribution 


The binomial distribution is generalized in the following natural fashion. Suppose 
that an experiment is repeated 7 times. Each replication of the experiment terminates 


in one of k mutually exclusive and exhaustive events Ai, A2,... , Ag. Let p; be the 
probability that the experiment terminates in A;, j = 1,2,... ,k, and suppose that 
pj G = 1,2,... ,k) remains constant for all n replications. We assume that the n 
replications are independent. 

Let x1,%2,... , Xx—1 be nonnegative integers such that xy +x2+---+x4_1 <a. 
Then the probability that exactly x; trials terminate in A;,i = 1,2,...,k — 1, and 


hence that x, =n ~— (xy + x2 +--+ + xx_}) trials terminate in A, is clearly 


n! x1 x2 


mth Py 


ae Se -» py 
xy!x!--- k° 
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If (X1, X2,..., Xz) is arandom vector such that X; = x; means that event A; has 
occurred x; times, x; = 0,1, 2,... ,, the joint PMF of (X1, X2,... , Xx) is given 
by 


(55) P{X, = x1, X2=X2,..., Xk = Xx} 
n! xy x2 Xk . _ k 
_ eer Po +++ Pr ifn = >; x, 
0 otherwise. 


Definition 4. An RV (X,, X2,... , Xx-1) with joint PMF given by 


(56) P{X, =x, Xz =4x2,..., Xk-1 = XK-1} 


n! 7 Spr ya 
1x2 N—X{—...-XK-] 
——  ——— p's’... 
xylxol...(2—2y — es — pet k 
= if xy +x2+--- + xR-1 <7, 
0 otherwise, 


is said to have a multinomial distribution. 
For the MGF of (X1, X2,... , Xz—1) we have 


CI Mii tis.a bal bee 


n tnt 2 *k 
Ss ettitetten ee PL Pa ---Pr 


Xy!Xq! +--+ xy! 


X1,XQ,0+. Xp—-1 =O 
Xy+x2+...XK-1 SM 


n 


(pie")*" (pre)? .., 


lI 


'x5! i] 
Xq,X25--. Xb-1=0 XpTXQ+ 1. XK: 
Xi AxQ+...Xp—7 SN 


. (pe_1e*—' )**! pit 
= (pie + pre? + +--+ pr—re™! + py)” 


for all t), to, ... , te-1 € R. 
Clearly, 
M(t, 0,0,... ,0) = (pre + po +++: + pe)” = (1 — pi + pie)”, 


which is binomial. Indeed, the marginal PMF of each X;,i = 1,2,...,k — 1, is 
binomial. Similarly, the joint MGF of X;, X;,i, fj =1,2,...,k -1@#4 J), is 


M(0,0,... ,0,1;,0,...,0,t;,0,... ,0) = [pie + pje? +1 — pi — ppl", 


which is the MGF of a trinomial distribution with PMF 
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{ ‘ ; ; 
nt Xj N~Xj—X; 
Li 


a ee ee, | = =A s 
(58) LO) Gael P; Pr Pr =1— pi — pj- 


Note that the RVs X1, X2,... , X,— are dependent. 
From the MGF of (Xj, X2,... , X%—1) or directly from the marginal PMFs we 
can compute the moments. Thus 


(59) EX; =np; and var(X;) =npj( — pj), j=1,2,...,k-1, 
and for j = 1,2,...,k —1, andi # j, 
(60) cov(X;, X;) = E[(X; — npj)(Xj; — npj)] = —npi pj. 
It follows that the correlation coefficient between X; and X; is given by 
1/2 
(61) w=-[p— Pe) i,j=1,2,...,k-1 @)j). 
Example 11. Consider the trinomial distribution with PMF 


ni _ 
PI =a, y =a th OY, 
{ 4 y} xtyl(n—x — yi P2Ps 


where x, y are nonnegative integers such that x + y < n, and pj, po, p3 > 0 with 


Pi + pz + p3 = 1. The marginal PMF of X is given by 


P{X =x}= (")era —pyP*, x =0,1,2,...,0. 


It follows that 
P{Y = y|X =x} 


(n — x)! p2 py. OES ‘ 

| fy =0,1,2,...,n—x, 

=tyla—x—yl—p \l-pi my we 
0 otherwise, 


(62) 


which is b(n — x, p2/(1 — p1)). Thus 


(63) E{Y|x}=(n-yn ZL. 
1—p 

Similarly, 

(64) E{Xly}=(n-yy— 


1 — po 
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Finally, we note that if X = (X1, X2,..., Xx) and Y = (%1,%,..., Y,) are 
two independent multinomial RVs with common parameter (pi, p2,-.-. , px), then 
Z = X+ Y is also a multinomial RV with probabilities (p;, p2,... , px). This fol- 
lows easily if one employs the MGF technique, using (58). Actually, this property 
characterizes the multinomial distribution. If X and Y are k-dimensional, nonnega- 
tive, independent random vectors, and if Z = X + Y is a multinomial random vector 
with parameter (pi, p2,... , px), then X and Y also have multinomial distribution 
with the same parameter. This result is due to Shanbhag and Basawa [101] and will 
not be proved here. 


5.2.10 Multivariate Hypergeometric Distribution 


Consider an urn containing N items divided into k categories containing nj, n2,... , 
nx items, respectively, where ee nj = N.A random sample, without replace- 
ment, of size n is taken from the urn. Let X; = number of items in sample of type i. 
Then 


kin; N 
(65) P{X; = x1, X2=%2,..-,Xp =X} = | ay ee 
j 


j=l 
where xj =0,1,..., min(n, nj), and )>4_, xj =n. 
We say that (X;, X2,... , Xx-1) has multivariate hypergeometric distribution if 


its joint PMF is given by (65). It is clear that each X ; has a marginal hypergeometric 
distribution. Moreover, the conditional distributions are also hypergeometric. Thus 


ene —Nnj =) 
xi] \n-— xi — Xj 


P{X; = x; |X; = xj} = 


and 
ni\({N —nj —nj —ne 
Xi Nn— Xj —Xj — Xe 
toe) : 
n—Xj—Xe 


and so on. It is therefore easy to write down the marginal and conditional means and 
variances. We leave the reader to show that 


P{X; = x;|Xj =xj,Xe=xe) = 
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and 


N-n gnj\2 
cov(Xi, Xj) =—T—on (=). 
5.2.11 Multivariate Negative Binomial Distribution 


Consider the setup of Section 5.2.9, where each replication of an experiment ter- 
minates in one of k mutually exclusive and exhaustive events Ai, A2,..., Ax. Let 
pj = P(Aj;), j = 1,2,...,k. Suppose that the experiment is repeated until event 
Ax is observed for the rth time, r > 1. Then 


(66) P(Xi = x1, X2 = x2,...,Xe =r) 
at AE PRD, 
ee Ly 
( Kx!) - D! j=l 


for x = 0,1,2,... = 1,2,...,k-D,1 <r <00,0< pp <1, 87 pi <1, 
and pe = 1— i=} Py- 


We say that (X;, X2,... , X4-1) has a multivariate negative binomial (or nega- 
tive multinomial) distribution if its joint PMF is given by (66). 
It is easy to see that the marginal PMF of any subset of {X1, X2,... , Xx—1} is 


negative multinomial. In particular, each X ; has a negative binomial distribution. 
We will leave the reader to show that 


bed k~I ai 
(67) M(st, 52, -.- 5-1) = BeXs=1 9% = pf ( - 01) 


j=) 
and 
(68) cov(X;, Xj) = PRs. 
Px 
PROBLEMS 5.2 


1. (a) Let us write 


b(k:n, p) = (j Jota Spy (0/1, 2/2265 i0: 


Show that as k goes from 0 to n, b(k; n, p) first increases monotonically and 
then decreases monotonically. The greatest value is assumed when k = m, 
where m is an integer such that 
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(n+ lIp-—l<m<(+)p 


except that b(m — 1; n, p) = b(m;n, p) when m = (n + 1)p. 
(b) If k > np, then 


(k + 1) — p) 
P{X > k} < b(k; n, py); 
ae Le EE 
and if k < np, then 
(n—k-+1)p 


P{X <k} < b(k; n, py). 
(X sk} < bin, py 
2. Generalize the result in Theorem 10 to n independent Poisson RVs; that is, if 
X1, X2,... , Xn are independent RVs with X; ~ P(A;),i = 1,2,...,n, the 
conditional distribution of X;, X2,..., Xn, given st X; = t, is multinomial 
with parameters t, 1/7 Ais... . An/ DoT AG- 


3. Let X1, X2 be independent RVs with X; ~ b(nj, 5), i = 1, 2. What is the PMF 
of X; — X2+n2? 


4, A box contains N identical balls numbered | through N. Of these balls, n are 
drawn at a time. Let X,;, X2,... , X, denote the numbers on the n balls drawn. 
Let S, = ae X;. Find var(S,). 


5. From a box containing N identical balls marked 1 through N, M balls are 
drawn one after another without replacement. Let X; denote the number on 
the ith ball drawn, i = 1,2,...,M,1< M < N. Let Y = max(X, X2, 
... , Xm). Find the DF and the PMF of Y. Also find the conditional distribution 
of X,, X2,... , Xm, given Y = y. Find EY and var(Y). 


6. Let f(x; 1, p), x =90,1,2,... , denote the PMF of an NB(r; p) RV. Show that 
the terms f(x; r, p) first increase monotonically and then decrease monotoni- 
cally. When is the greatest value assumed? 


7. Show that the terms 
nk 
PAX =k} =e*—, k=0,1,2,..., 
k! 
of the Poisson PMF reach their maxima when k is the largest integer < and at 
(A — 1) and A if A is an integer. 


8. Show that 


k 
n\ x Roe 4 
({)e CPx py ee mn 


asin — oo and p — 0, so that np = ) remains constant. (Hint: Use Stirling’s 
approximation, namely, n! ~ /27 n®tl/2e-n ag n > oo.) 
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9. 


10. 


11. 


12. 


13. 


14 
15. 


16. 


A biased coin is tossed indefinitely. Let p (0 < p < 1) be the probability of 
success (heads), Let Y, denote the length of the first run and Y2 be the length of 
the second run. Find the PMFs of Y; and Y2, and show that EY; = q/p + p/q, 
EY, = 2. If Y, denotes the length of the nth run, n > 1, what is the PMF of Y,,? 
Find EY,. 


Show that 


N\~! (Np\ (NG — p) n\ , n-k 
(*) Gall n—k )> Ge cee) 


as N — oo. 
Show that 


r+k—-1\, ee 
( , )ora-p) —e eu 


as p — 1 andr — oo in such a way that r(1 — p) = A remains fixed. 


Let X and Y be independent geometric RVs. Show that min(X, Y) and X — Y 
are independent. 


Let X and Y be independent RVs with PMFs P(X = k} = px, P{Y =k} = ge. 
k =0,1,2,..., where px, qx > Oand "P29 pe = YP ak = 1. Let 


P{X=k|X+Y=t}= (;Jata —a,)'*, O<k<t. 


Then a, = a@ for all t, and 


= e 8 (apy* a, e 8 9k 
ame ree and qk = TT 
where 8 = a/(1 — a), and @ > Ois arbitrary. (Chatterji [12]) 
Generalize the result of Example 10 to the case of k urns, k > 3. 


Let (Xi, X2,..., Xx~1) have a multinomial distribution with parameters n, 
Pi, P2,--+ » Pk—-1- Write 


o— 3 (Xi ~ mpi)? 
md NDi / 


where py = 1 — py —--- — pe_-1, and X, = n— X; ~—---— Xy_). Find EY and 
var(Y). 


Let X1, X2 be iid RVs with common DF F, having positive mass at 0, 1,2,.... 
Also, let U = max(X,, X2) and V = X; — X2. Then 
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P{U=j, V=O0}= P{U = jJP{V =9} 


for all j if and only if F is a geometric distribution. (Srivastava [107]) 


17, Let X and Y be mutually independent RVs, taking nonnegative integer values. 
Then 


P{X <n}— P{X+Y <n} =aP{X+Y =n} 
holds forn = 0,1, 2,... and some a > O if and only if 


1 a \" 
P{Y =n} = ——- | ——-] ,, =O; 126527 
n) a(t) s : 


(Hint: Use Problem 3.3.8.) (Puri [81]) 


18. Let X1, X2,... be a sequence of independent b(1, p) RVs with O < p < 1. 
Also, let Zy = ee X;, where N is a P(A) RV that is independent of the X;’s. 
Show that Zy and N — Zy are independent. 


19. Prove Theorems 5, 7, 8, and 11. 


5.3 SOME CONTINUOUS DISTRIBUTIONS 


In this section we study some. most frequently used absolutely continuous distribu- 
tions and describe their important properties. Before we introduce specific distribu- 
tions it should be remarked that associated with each PDF f there is an index or a 
parameter 6 (may be multidimensional) which takes values in an index set ©. For 
any particular choice of 6 € © we obtain a specific PDF f6 from the family of PDFs 
(fo, 9 € O}. 

Let X be an RV with PDF fg(x), where @ is a real-valued parameter. We say that 
6 is a location parameter and { fg} is a location family if X — @ has PDF f(x) which 
does not depend on 6. The parameter @ is said to be a scale parameter and { fo} isa 
scale family of PDFs if X/@ has PDF f (x) which is free of 0. If 6 = (4, o) is two- 
dimensional, we say that 6 is a location-scale parameter if the PDF of (X — )/o is 
free of 42 and oc. In that case, { fg} is known as a location-scale family. 

It is easily seen that @ is a location parameter if and only if fo(x) = f(x — 8), 
a scale parameter if and only fe(x) = (1/6) f (x), and a location-scale parameter if 
fo(x) = (l/o) f(( — p)/a), o > 0 for some PDF f. The density f is called the 
standard PDF for the family { fo, 0 € ©}. 

A location parameter simply relocates or shifts the graph of PDF f without chang- 
ing its shape. A scale parameter stretches (if @ > 1) or contracts (if 9 < 1) the graph 
of f. A location-scale parameter, on the other hand, stretches or contracts the graph 
of f with the scale parameter and then shifts the graph to locate at yz (see Fig. 1). 

Some PDFs also have a shape parameter. Changing its value alters the shape of 
the graph. For the Poisson distribution 4 is a shape parameter. 


“ATIuney apes TerUaUOdXa (q) ‘AyTUIey UONeDSO] TenuoUOdxgY (”) *] ‘BL 
(q) (0) 


a (or) 


alu 
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*-9XO = (x) 9f Aru] Jajourered adeys (p) ‘Ayrurey apeos-uoryed0] [euLIoU (9) ‘(panuyuor) *T “BLA 


(P) (9) 
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For the following PDF, 
F(x; 1, B, @) (534) | rel x>u 
x; +P» = a , 
Br) \ B B 


and = 0 otherwise, yz is a location, B a scale, and a a shape parameter. The standard 
density for this location-scale family is 


a 1 a~—1l x 
EONS pat e*, x>0 


and = 0 otherwise. For the standard PDF f, a is a shape parameter. 


5.3.1. Uniform Distribution (Rectangular Distribution) 
Definition 1. An RV X is said to have a uniform distribution on the interval 
{a, b], -co <a <b < w, if its PDF is given by 
I 
(1) f(x)=4b-a’ 


0, otherwise. 


a<x<b, 


We will write X ~ U[a, b] if X has a uniform distribution on [a, b]. 
The endpoint a or b or both may be excluded. Clearly, 
oo 
/ FQ)dx =1, 
~00 


so that (1) indeed defines a PDF. The DF of X is given by 


0, x <a, 
= POXS — a<x<b, 
b-a 
1, b<x; 
a+b k peti — git! 
3 EX = ’ E = ——_——_—_., k . . 
‘ 2 (k + l(b —a) > O is an integer 
(b—a)? 
2 xX) = -——_——: 
(4) var(X) a 
(5) M(t) = (et? — ef), £0. 


~ t(b—a) 
Example I. Let X have a PDF given by 


Ae?” 0<x<0o, A>O, 
f@)= 


0, otherwise. 
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Then 


0 x<0 
F — a. 
*) ( —e Ar, x>0. 


Let Y = F(X) = 1—e7**. The PDF of ¥ is given by 


fry) = ~ ; ne *-1/A) log(i—y) 1, O<y<l. 


Let us define fy(y) = 1 at y = 1. Then we see that Y has density function 


1, O<y<il, 
0, otherwise, 


fr) = 


which is the U[0, 1] distribution. That this is not a mere coincidence is shown in the 
following theorem. 


Theorem 1 (Probability Integral Transformation). Let X be an RV with a 
continuous DF F. Then F(X) has the uniform distribution on [0, 1]. 


The proof is left as an exercise. 
The reader is asked to consider what happens in the case where F is the DF of a 


discrete RV. In the converse direction the following result holds. 


Theorem 2. Let F be any DF, and let X be a U[0, 1] RV. Then there exists a 
function h such that h(X) has DF F, that is, 


(6) P{h(X) < x} = F(x) for all x € (—00, 00). 
Proof. If F is the DF of a discrete RV Y, let 
P{Y = yx} = pk, k=1,2,.... 
Define h as follows: 


yt if0<x < py, 
h(x) =4{y2 ifpi <x <pit+p2, 


Then 


P{h(X) = yt} = P{O< X < pi} = pt, 
P{h(X) = yo} = P{pi < X < pt + pat} = po, 
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and, in general, 
P{h(X) = ye} = De, | as (ae geet 
Thus A(X) is a discrete RV with DF F. 
If F is continuous and strictly increasing, F~! is well defined, and we take 


h(X) = F7'(X). We have 


P(h(X) < x} = P{F-'(X) <x} 


= P{X < F(x)} 
= F(x), 
as asserted. 
In general, define 
(7) F~'(y) = inf(x: F(x) > y}, 
and let h(X) = F~!(X). Then we have 
(8) {F-'(y) <x} = {y < F@)}. 


Indeed, F-'(y) < x implies that for every ¢ > 0, y < F(x +6). Since ¢ > Ois 
arbitrary and F is continuous on the right, we let ¢ — 0 and conclude that y < F(x). 
Since y < F(x) implies that F =4 (y) < x by definition (7), it follows that (8) holds 
generally. Thus 


P(F-1(X) <x} = P{X < F(x)} = F(x). 


Theorem 2 is quite useful in generating samples with the help of the uniform 
distribution. 


Example 2. Let F be the DF defined by 


0, x <0 


1-—e™, x >0. 


F(x) = | 
Then the inverse to y = 1 — e*, x > 0, is x = - log(1 — y), 0 < y < 1. Thus 
h(y) = —log(1 — y), 
and — log(1 — X) has the required distribution, where X is a U(0, 1] RV. 


Theorem 3. Let X be an RV defined on (0, 1]. If P{x < X < y} depends only 
on y — x forallO <x < y < 1, then X is U[0, 1]. 
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Proof, Let P{x < X < y}= f(y—x); then f(x+y) = P(0<X <x+y}= 
P{0< X <x}4+P{x < X <x+y} = f(x) + f(). Note that f is continuous 
from the right. We have 


f(x) = f(x) + FO), 
so that 
f(0) = 0. 


We will show that f(x) = cx for some constant c. It suffices to prove the result for 
positive x. Let m be an integer; then 


f(mx) = fx) +--+ + £(&) = mf (x). 


Letting x = n/m, we get 


Gales (=) ae) 


so that 


r(*)=2re= “ro, 


m 


for positive integers n and m. Letting f(1) = c, we have proved that 
S@) =cx 


for rational numbers x. 

To complete the proof we consider the case where x is a positive irrational number. 
Then we can find a decreasing sequence of positive rationals x1, x2,... such that 
Xn — x. Since f is right continuous, 


f@)= ba fn) = am CXy == CX. 
Now, for 0 < x < 1, 
F(x) = P{X <0} + P{0 < X <x} 
= F(0)+ P{0 < X < x} 
= f(x) 


= cx, O<x<l. 
Since F(1) = 1, we must have c = 1, so that 
F(x) =x, O<x<il. 


This completes the proof. 


SOME CONTINUOUS DISTRIBUTIONS 211 


5.3.2 Gamma Distribution 
The integral 


(9) l(a) = fe dx 
0+ 


converges or diverges according as a > 0 or < 0. For a > 0 the integral in (9) is 
called the gamma function. In particular, if a = 1, °(1) = 1. Ifa > 1, integration 
by parts yields 


fo @) 
(10) T(@) =(a—- »f x? 29% dx = (a — II(a — 1). 
0 
If w = n is a positive integer, then 


qa) Ma) =(m—D!. 


Also writing x = y?/2 in r(}), we see that 


1 1 soa 2 
ri-)=—] eady. 
(3) Al. - 


Now consider the integral J = f°, e~¥'/? dy, We have 


oo 60° 2 2 
Pr = | / exp Sy dx dy, 
—oo J—00 


and changing to polar coordinates, we get 


2x foo r2 
PS [ rexp|—— | drd0@ =2n. 
0 Jo 2 


It follows that (5) = Jz. 
Let us write x = y/B, B > 0, in the integral in (9). Then 


0° ,a-1 
(12) Pa) =| e ¥/B dy, 
0 


pe 


so that 


1 a-—l 1B 
(13) [ “1e-y/B dy — 1, 
lo Tope - 
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Since the integrand in (13) is positive for y > 0, it follows that the function 


I e-1g-y/B 
eae Se ’ 0 > 
(14) fo)= T(@)pe> é <y< oo 
0, y <0. 
defines a PDF for a > 0, B > 0. 


Definition 2. An RV X with PDF defined by (14) is said to have a gamma distri- 
bution with parameters a and 6. We will write X ~ G(a, B). 


Figure 2 gives graphs of some gamma PDFs. 
The DF of a G(a, B) RV is given by 


° x <0, 
(5) a ee ar, 

dy, : 
Tap Jp ye y x>0 


The MGF of X is easily computed. We have 


[o.e) 
(16) M(t) = mae f etl p) eal yy 


_ 1 @ ie et oa ZL 
=(-n) f ra“ '*% 


1 
= (1 — pt) ™, =. 
(1 — Bt) t< r 


It follows that 


(17) EX = M'(t)|,=0 = oB 

and 

(18) EX? =M"()hi-0 = (a + 1B’, 
so that 

(19) var(X) = a?. 


Indeed, we can compute the moment of order n such that a +n > 0 directly from 
the density. We have 


1 ox 
20 EX’ = [ x/Byatn—l 
(20) rap Jo e x x 


T(@ +n) 
T(@) 
= p"(atn—1)(@+n—2)---a 


= p" 
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The special case when a = 1 leads to the exponential distribution with param- 
eter B. The PDF of an exponentially distributed RV is therefore 


Bu'e*/8, x>0, 
0, otherwise. 


(21) fQ@)= | 


Note that we can speak of the exponential distribution on (—oo, 0). The PDF of such 
an RV is 


—1¢x/B 
nwa a 
Clearly, if X ~ G(1, B), we have 
(23) EX" =n!p" 

(24) EX=8 and var(X) = fp’, 
and 
(25) Ma)=(1—6t)7! ~~ fort < Bo. 


Another special case of importance is when a = n/2,n > O (an integer) and 


p=2. 


Definition 3. An RV X is said to have a chi-square distribution ( x?-distribution) 
with n degrees of freedom where n is a positive integer if its PDF is given by 


1 
(26) fa) = {Paar 
0, x <0. 


Wx/2ynf2-) 0<x <0, 


We will write X ~ x2(n) fora x2 RV with n degrees of freedom (d.f.). 


If X ~ x2(n), then 


(27) EX =n, var(X) = 2n, 
k 
(28) EX' — eEei2) eal. 
T'(n/2) 
and 
(29) M(th=(1—21)"? ~~ fort <5. 


Theorem 4. Let X;, X2,... , X, be independent RVs such that Xj ~ G(a;, B), 
j=1,2,....n. Then S, = }p_, Xe isa Gry aj, B) RV. 
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Corollary 1. Let X1, X2,... , Xn be iid RVs, each with an exponential distribu- 
tion with parameter 6. Then S, isa G(, B) RV. 


Corollary 2. If X;, X2,..., Xn are independent RVs such that X; ~ x7(rj), 
j=1,2,...,n, then S, isa x7(S-7_, rj) RV. 


Theorem 5. Let X ~ U(0, 1). Then ¥ = —2log X is x7(2). 


Corollary. Let X;, X2,..., X, be iid RVs with common distribution U (0, 1). 
Then —2 )-"_, log X; = 2log(1/|]"_, Xi) is x7(2n). 


Theorem 6. Let X ~ G(a), B) and Y ~ G(az, B) be independent RVs. Then 
X + Y and X/Y are independent. 


Corollary. Let X ~ G(a;, B) and Y ~ G(qz, B) be independent RVs. Then 
X + Y and X/(X + Y) are independent. 


The converse of Theorem 6 is also true. The result is due to Lukacs [66], and we 
state it without proof. 


Theorem 7. Let X and Y be two nondegenerate RVs that take only positive val- 
ues. Suppose that U = X + Y and V = X/Y are independent. Then X and Y have 
gamma distribution with the same parameter £. 

Theorem 8. Let X ~ G(1, 8). Then the RV X has “no memory,” that is, 

(30) P{X >r+s|X >s}= P{X >7r} 


for any two positive real numbers r and s. 


The proof is left as an exercise. 
The converse of Theorem 8 is also true in the following sense. 


Theorem 9. Let F be a DF such that F(x) = Oif x <0, F(x) < Lifx > 0, and 


1— F(x+y) 


=1—F() for allx, y > 0. 
1 — F(y) ? 


(31) 

Then there exists a constant 8 > 0 such that 

(32) 1—F(x)=e | x>0. 
Proof. Equation (31) is equivalent to 


g(x + y) = a(x) + a(y) 
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if we write g(x) = log{1 — F(x)}. From the proof of Theorem 3 it is clear that the 
only right continuous solution is g(x) = cx. Hence F(x) = 1 — e™, x > O. Since 
F(x) —~ 1 asx — oo, it follows that c < 0 and the proof is complete. 


Theorem 10. Let X;, X2,..., Xn be iid RVs. Then X; ~ G(1,nB),i = 
1,2,... ,n, if and only if X(1) is GC, 8). 


Note that, if X;, X2,... , X, are independent with X; ~ GC, 6;),i = 1,2,... ,n, 
then Xq1) isa GCI, 1/ 7?_; B /) RV. 

The following result describes the relationship between exponential and Poisson 
RVs. 


Theorem 11. Let X1, X2,... be a sequence of iid RVs having common expo- 
nential density with parameter 6 > 0. Let S, = )y— Xx be the nth partial sum, 
n = 1,2,..., and suppose that t > 0. If Y = number of S, € [0,1], then Y isa 
P(t/B) RV. 


Proof. We have 
1 foe] 
P{Y = 0} = P{S; >t}= af e 7/8 dx = e H/F, 
t 


so that the assertion holds for Y = 0. Let n be a positive integer. Since the X;’s are 
nonnegative, S, is nondecreasing, and 


(33) P{Y =n} = P{S, <t, Sn41 > th. 

Now 

(34) P{Sn <t} = P{Sn <t, Snti > t}+ P{Sn4i < t}. 
It follows that 

(35) P{Y =n} = P{Sp <t} — P{Snti St}, 


and since S, ~ G(n, B), we have 


: 1 f 1 
P{yY = = n-1} —x/B gq -{ eT * —X/B 
aoe I roe 8 I Pat pr 
tet/B 
= “gmat” 


as asserted. 


Theorem 12. If X and Y are independent exponential RVs with parameter B, 
then Z = X/(X + Y) has a U(O, 1) distribution. 
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Note that in view of Theorem 7, Theorem 12 characterizes the exponential distri- 
bution in the following sense. Let X and Y be independent RVs that are nondegener- 
ate and take only positive values. Suppose that X + Y and X/Y are independent. If 
X/(X + Y) is U(O, 1), X and Y both have the exponential distribution with param- 
eter 8. This follows since by Theorem 7, X and Y must have the gamma distribution 
with parameter 6. Thus X/(X + Y) must have (see Theorem 14) the PDF 


F(a, + a2) 


ee eh yee O<x <1, 
P(ay)T (a2) ‘ 


f@)= 


and this is the uniform density on (0, 1) if and only if a; = a2 = 1. Thus X and Y 
both have the G(1, B) distribution. 


Theorem 13. Let X be a P(A) RV. Then 
1 foe) 
(36) P{X < K} = al etx dx 
K! J, 


expresses the DF of X in terms of an incomplete gamma function. 


Proof. 
4 ny <K}= 3 Leg = We) 
dr = ra 
4K enh 
K! ’ 


and it follows that 


as asserted. 
An alternative way of writing (36) is the following: 
P{X < K} = P{Y > 2d}, 
where X ~ P(A), and Y ~ x?(2K + 2). 


5.3.3 Beta Distribution 
The integral 


1— 
(37) B(a, B) = [ x%—!d — xP! dx 


0+ 
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converges for a > 0, B > O and is called a beta function. For a < 0 or B < 0 the 
integral in (37) diverges. It is easy to see that fora > 0, B > 0, 


(38) Ba, B) = B(B, a), 
(39) B(a, B) = [ 711 4 2) °F dx, 
O+ 
and 
_ T@)ré) 
(40) B(a, B) = T@+p) 
It follows that 
eS ie oy 
(41) f@)= Ba, p) ‘ 
0, otherwise, 
defines a PDF. 


Definition 4. An RV X with PDF given by (41) is said to have a beta distribution 
with parameters a and B, a > 0, B > 0. We will write X ~ B(a, B) for a beta 
variable with density (41). 


Figure 3 gives graphs of some beta PDFs. 


Fig. 3. Beta density functions 
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The DF of a B(a, B) RV is given by 


0, x <0, 
x 
(42) F(x) = } [B@, pr | ya —yPldy, O<x <1, 
0+ 
1, x>1. 


If n is a positive number, then 


n_ } ; nta—1 p-1 
(43) Ex" = Baa. af x (1 — x)" dx 
= Bin+a,B) Vn+a)(@+B) 


Ba, B)  VT@rnt+a+ fp)’ 


using (40). In particular, 


a 
(44) EX=— 5 
and 
(45) var(X) = oF 


(a+ B)2(a+f6+4+1) 


For the MGF of X ~ B(a, B), we have 
1 1 
(46) M(t) = al ef x%—1 — xP! dx, 


Since moments of all order exist, and E|X|/ < 1 for all j, we have 


co A 


(47) Mi)=)> EX! 


3 wt Ta@+jC(@ +P) 
jo rgG+Dr@+p+/)re) 

Remark 1. Note that in the special case where a = B = | we get the uniform 
distribution on (0, 1). 


Remark 2. If X is a beta RV with parameters a and f, then | — X is a beta 
variate with parameters 6 and a. In particular, X is B(a, a) if and only if 1 — X is 
B(q@, a). A special case is the uniform distribution on (0, 1). If X and 1 — X have the 
same distribution, it does not follow that X has to be B(a@, a). All this entails is that 
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the PDF satisfies 
f(x) = fd —x), O<x<l. 
Take 
f@)= [xed —x)F 1+ (1x xP, Oca <i, 


——_—_-—_—_—_- [x 
Bia, B) + B(B, «) 
Example 3. Let X be distributed with PDF 


ha 2a — x), O0<x <li, 


LM 0, otherwise. 
Then X ~ B(3, 2) and 

EY" = la+3)F6) ~ 4 (n+2)! _ 12 

rByran+5) 2! (n+4)! (n+4)(n +3)’ 
EX = = var(X) = ° — : 
~ 20° ~ 52.6 25° 

StF (7 4+2)14! 

ma=>oe. Pte 
m0 G+4!2! 


[o¢) ti 
LG FHGFD G cha +3) jl’ 


and 
1 0.5 
P{0.2 < X < 0.5) = af (x? — x3) dx = 0.023. 
12 Jo2 


Theorem 14. Let X and Y be independent G(a@1, 8) and G(az, B), respectively, 
RVs. Then X/(X + Y) is a B(@, a2) RV. 


Let X1, X2,... , X» be iid RVs with the uniform distribution on [0, 1]. Let X«) 
be the kth-order statistic. 


Theorem 15. The RV X,) has a beta distribution with parameters a = k and 
B =n—k+1. 


Proof. Let X be the number of X;’s that lie in [0, 1]. Then X is b(n, t). We have 


P{Xm <t}=P{X>h=)- (“ea =, 


jek 
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{P(x >= (“Juv =F w= jr d=) 
dt j 


= [a(™ eta ard =a" ia ot] 
j==k j-1 J 


n—-1)\ 4] -k 
= aes @ Roa 2 ae 
nr) (1 —t) 


On integration, we get 


= t 
P{Xq@) <th=n . [ pm —x)"* dx, 
k-I1} Jo 


as asserted. 


Remark 3. Note that we have shown that, if X is b(n, p), then 


(48) 1—P{X <k}= H(t - i) Pe oid —xy""* dx, 
k-1) Jo 


which expresses the DF of X in terms of the DF of a B(k, n — k + 1) RV. 


Theorem 16. Let X1, X2,..., Xn be independent RVs. Then Xj, X2,..., Xn 
are iid B(a, 1) RVs if and only if X(,) ~ B(an, 1). 


5.3.4 Cauchy Distribution 


Definition 5. An RV X is said to have a Cauchy distribution with parameters jz 
and @ if its PDF is given by 


Bb j 


(49) f@)= yy a Perv 


—o<x<o, pw>Qd. 


We will write X ~ C(u, 0) for a Cauchy RV with density (49). 


Figure 4 gives graph of a Cauchy PDF. 
We first check that (49) in fact defines a PDF. Substituting y = (x — 6)/p, we get 


oo 1 oo dy 2 ai 
ax=— —-~ = —-(t eo ‘ 
[fe . =i eee ee Ya : 
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Fig. 4. Cauchy density function. 


The DF of a C(1, 0) RV is given by 
1 1 

(50) F(x) = 5+ —tan"'x, —00 <x < 00. 
w 


Theorem 17. Let X be a Cauchy RV with parameters . and 9. The moments of 
order < | exist, but the moments of order > 1 do not exist for the RV X. 


Proof. It suffices to consider the PDF 


1 1 
FONE a —00 <x <0. 
2 £% 1 
exit == [ xo dx: 
x Jo 14+ x2 


and, letting z = 1/(1 + x”) in the integral, we get 
1 
E|X|* = =f 2l-aQley _ pyle /2I-\ gy 
m Jo 


which converges for a < 1 and diverges for a > 1. This completes the proof of the 
theorem. 


It follows from Theorem 17 that the MGF of a Cauchy RV does not exist. This 
creates some manipulative problems. We note, however, that the cf of X ~ C(y, 0) 
is given by 


(51) o(t) = e Hel, 
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Theorem 18. Let X ~ C(1, 0;) and Y ~ C(y22, 62) be independent RVs. Then 
X+/Y isaC(wy + 2, 6) + 2) RV. 


Proof. For notational convenience we will prove the result in the special case 
where jt] = {42 = 1 and 6; = 6 = 0, that is, where X and Y have the common PDF 


1 1 
f@) = mn 1+x2’ 


-—OO <X¥ < Ww. 


The proof in the general case follows along the same lines. If Z = X + Y, the PDF 
of Z is given by 


fa(2) de : ay 
=—+ aS Ta x. 
te ee I ol tee 14+@—2) 
Now 
1 
(1 + x2)f1 + (z — x)?] 
1 2zx 22 22? — 2zx z2 | 


~ Fg ta) |1ee 142  14aG@oxy 1+ —2F 
so that 
fz(2) ! : lo L+2! +27 tan! x + 2” tan71( ) . 
= >a | z log —— +2 z x 
BN me +4) | PTH xP a ee 
1 2 a a 
=->—, —00 <z7< OO. 
mw z2 +42? . 


It follows that if X and Y are iid C(1, 0) RVs, then X + Y is a C(2, 0) RV. We note 
that the result follows effortlessly from (51). 


Corollary. Let X1, X2,... , X, be independent Cauchy RVs, X; ~ C(x, %), 
k=1,2,...,. Then S, = )7j Xz is aC(Q 7} Mk, D-] O%) RV. 


In particular, if X1, X2,... , Xn are iid C(1, 0) RVs, n—'§, is also a C(1, 0) RV. 
This is a remarkable result, the importance of which will become clear in Chapter 6. 
Actually, this property uniquely characterizes the Cauchy distribution. If F is a non- 
degenerate DF with the property that n~!S, also has DF F,, then F must be a Cauchy 
distribution (see Thompson [112, p. 112)). 

The proof of the following result is simple. 


Theorem 19. Let X be C(z, 0). Then A/X, where A is a constant, isa C(|A|/p, 0) 
RV. 


Corollary. X is C(1, 0) if and only if 1/X is C1, 0). 
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We emphasize that if X and 1/X have the same PDF on (—00, 00), it does not 
follow* that X is C(1, 0), for let X be an RV with PDF 


1 

4 if |x| <1, 
f@)= 1 ; 

4x2 if |x| > 1. 


Then X and 1/X have the same PDF, as can easily be checked. 
Theorem 20. Let X be a U(—2/2, 1/2) RV. Then Y = tan X is a Cauchy RV. 


Many important properties of the Cauchy distribution can be derived from this 
result (see Pitman and Williams [78}). 


5.3.5 Normal Distribution (Gaussian Law) 


One of the most important distributions in the study of probability and mathematical 
statistics is the normal distribution, which we examine presently. 


Definition 6. An RV X is said to have a standard normal distribution if its PDF 
is given by 


(52) g(x) = e 7/2), ~—00 < xX < 00. 


We first check that f defines a PDF. Let 
ie,¢] 
r= / et? dy. 
—ce 


2 
—x* {2 < eT 


Then 


O<e —-O <x <0, 


foe) 
/ e It dy = 2e, 
—00 


and it follows that J exists. We have 


CO 
i= f y V2e-y/2 dy 
0 


*Menon [71] has shown that we need the condition that both X and 1/X be stable to conclude that X 
is Cauchy. 
A nondegenerate distribution function F is said to be stable if for two iid RVs X,, X2 with common 
DF F, and given constants a;,a2 > 0, we can find a > 0 and B{a;, a2) such that the RV 


X3 = a! (a,x; + a2X2 — B) 


again has the same distribution F’. Examples are the Cauchy (see the corollary to Theorem 18) and normal 
(discussed in Section 5.3.5) distributions. 
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Thus ye (x) dx = 1, as required. 
Let us write Y = 0 X + ys, where o > 0. Then the PDF of Y is given by 


u(y) = “9 (a “) 


(53) 


2 
e lO-B) [20 —o<y<o; o>0, -w<pU<o. 


Definition 7. An RV X is said to have a normal distribution with parameters 
(—0co < p < 00) and o(> 0) if its PDF is given by (53). 


If X is a normally distributed RV with parameters yz and a, we will write X ~ 


Np, 07). In this notation, g defined by (52) is the PDF of an N’(0, 1) RV. The DF 
of an N’(0, 1) RV will be denoted by (x), where 


(54) (x) = : Jere du. 


rae 


Clearly, if X ~ N(, 07), then Z = (X — )/o ~ N(O, 1). Z is called a standard 
normal RV. For the MGF of an (jz, 02) RV, we have 


(55) M(t)= rae Sn t* ie dx 
= tO 20 o 20 


1 90 ei pte Ae 242 
oO 


2a J—oo 


for all real values of t. Moments of all order exist and may be computed from the 
MGEF. Thus 


(56) EX =M'(Q)|-9 = (Ut 07t)M@|=0 =u 
and 
(57) EX? = M" (t)|t=0 = (Mo? + (u + 07tY M0 


=o? +p’. 
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Thus 
(58) var(X) = a”. 


Clearly, the central moments of odd order are all zero. The central moments of 
even order are as follows: 


] lee) 
(59) E(X —p)" = 5 xe *"/20” gy {n is a positive integer) 
oOo TE J—oo 


2n 
o ‘I 
ae n+- 5 


= [(2n — 1)(2n — 3)---3- 1]o™ 


As for the absolute moment of order a, for a standard normal RV Z we have 
(60) E|z|* = —= ef” te el g 


Pree {(a+1)/2)I-l g—y/2 g 
M20 =a ‘ ? 
_ Pi@+1)/212% 
= Sra : 
As remarked earlier, the normal distribution is one of the most important distribu- 
tions in probability and statistics, and for this reason the standard normal distribution 
is available in tabular form. Table ST2 at the end of the book gives the probability 


P{Z > z} for various values of z(> 0) in the tail of an (0, 1) RV. In this book we 
write zy for the value of Z that satisfies a = P{Z > zq},0<a <I. 


Example 4. By Chebychev’s inequality, if E|X|* < 00, EX = mw, and var(X) = 
2 
o*, then 


1 
P(|X — p| > Ko} < — 
(1X — pl 2 Ko} so. 


For K = 2, we get P{|X —yz| > Ko} < 0.25, and for K = 3, we have P{{X — p| > 
Ko} < $. If X is, in particular, N( 0”), then 


P{|X — p| = Ko} = P{{Z| = K}, 
where Z is (0, 1). From Table ST2, 


P{|Z| > 1}=0.318, P{|Z]}>2}=0.046, and P{|Z| > 3} = 0.002. 
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Thus practically all the distribution is concentrated within three standard devia- 
tions of the mean. 


Example 5. Let X ~ N(3, 4). Then 


PAG! Mielke 
PQ2<X<5}=P ac AS 2 le rtos<zsy 


= P{Z < 1}— P{Z < -0.5} 
= 0.841 — P{Z > 0.5} 
= 0.0841 — 0.309 = 0.532. 


Theorem 21 (Feller [22, p. 175]). Let Z be a standard normal RV. Then 


1 
(61) P{Z>x}* 5 er 2 as x —> 00. 
TUX 
More precisely, for every x > 0, 
1 2 1 1 1 2 
62 eee (5 - a) <riz>s < ere, 
on V2 x ox x/2n 


Proof. We have 
1 © 2 3 1 2 1 1 
63 = e 1/2y (1 - =) dy = ——e*/? (< = =) 
O° ie 4) OO Tin 


and 


(64) a i eV? ( rm ) yee e 
vV/ 2n Jx y 2 4 Vv Qn x ‘ 
as can be checked on differentiation. Approximation (61) follows immediately. 


Theorem 22. Let X1, X2,... , Xn be independent RVs with X, ~ N (uz, 3), 
k=1,2,...,n. Then Sy = dp, X~ is an NCS h1 Mk, 001 97) RV. 


Corollary 1. If X;, X2,... , X, are iid N (wu, 0”) RVs, then S, isan. N (np, no?) 
RV and n~'S, is an N(u, o7/n) RV. 


Corollary 2. If X1, Xo,..., Xn are iid N’(0, 1) RVs, then n—!/2, is also an 
NO, 1) RV. 


We remark that if X;, X2,... , X, are iid RVs with EX = 0, EX? = 1 such that 
n~'/2§) also has the same distribution for each n = 1,2,... , that distribution can 


SOME CONTINUOUS DISTRIBUTIONS 229 


only be (0, 1). This characterization of the normal distribution will become clear 
when we study the central limit theorem in Chapter 6. 


Theorem 23. Let X and Y be independent RVs. Then X + Y is normally dis- 
tributed if and only if X and Y are both normal. 


If X and Y are independent normal RVs, X + Y is normal by Theorem 22. The 
converse is due to Cramér [15] and will not be proved here. 


Theorem 24. Let X and Y be independent RVs with common V(0, 1) distribu- 
tion. Then X + Y and X — Y are independent. 


The converse is due to Bernstein [3] and is stated here without proof. 

Theorem 25. If X and Y are independent RVs with the same distribution, and 
if Z; = X + Y and Z2 = X — ¥ are independent, all RVs X, Y, Z1, and Z>2 are 
normally distributed. 

The following result generalizes Theorem 24. 

Theorem 26. If Xi, X2,... , Xn are independent normal RVs and }77_, ab; 
var(X;) = 0, then Ly = S7?_, a; Xj and Lz = )-j_, bX; are independent. Here 


aj, @2,... , a, and bj, bz, ... , by are fixed (nonzero) real numbers. 


Proof. Wet var(X;) = a}, and assume without loss of generality that. EX; = 0, 
i= 1,2,...,n. For any real numbers a, B, and t, 


n 
Ee(@litblLat — Fexp : KC + od 
1 


= i: 2.2 
— | [exp 7% (aa + Bb;)“o; 
i=l 


n n 
= I] Eel @axi I] Ee'Pbixi 
1 1 


n n 
= Eexp (1 doa xi) Eexp G bi x) = Eeth! Fh tha, 
1 1 
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Thus we have shown that 

Mat, Bt) = M(at,0)M(O, Bt) for all a, B, ft. 
It follows that L; and Lz are independent. 


Corollary. If X;, X2 are independent NV (1, 07) and N (12, 07) RVs, then X} — 
X2 and X; + X2 are independent. (This gives Theorem 24.) 


Darmois [19] and Skitovitch [104] provided the converse of Theorem 26, which 
we state without proof. 


Theorem 27. If X), X2,..., X, are independent RVs, a1, a2,...,an, by, bo, 
... , Dy, are real numbers none of which equals zero, and if the linear forms 


n n 
Ly = aX; and Ly =) biX; 
i =i i=! 
are independent, all the RVs are normally distributed. 


Corollary. If X and Y are independent RVs such that X + Y and X — Y are 
independent, X, Y, X + Y, and X — Y are all normal. 


Yet another result of this type is the following theorem. 


Theorem 28. Let X;, X2,... , X, be iid RVs. Then the common distribution is 
normal if and only if 


n n 
Sn =) Xx and Y= xe? —n's,)* 
k=}! i=] 


are independent. 

Tn Chapter 7 we prove the necessity part of this result, which is basic to the theory 
of t-tests in statistics (Chapter 10; see also Example 4.4.6). The sufficiency part was 
proved by Lukacs [65], and we will not prove it here. 

Theorem 29. X ~ N(O, 1) = X? ~ x?(1). 

See Example 2.5.7 for the proof. 

Corollary 1. If X ~ N (yw, 07), the RV Z? = (X — )*/o? is x2(1). 


Corollary 2. If X;, X2,... , Xn are independent RVs and X, ~~ N (jx, ap), k= 
1,2,...,n, then 771 (Xe — wx)?/o7 is x(n). 
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Theorem 30. Let X and Y be iid V(0, 07) RVs. Then X/Y is C(I, 0). 


For the proof, see Example 2.5.7. 

We remark that the converse of this result does not hold; that is, if Z = X/Y is 
the quotient of two iid RVs and Z has a C(1, 0) distribution, it does not follow that 
X and Y are normal, for take X and Y to be iid with PDF 


Ce 


Tox!’ —-00 <X < CO. 
nN x 


We leave the reader to verify that Z = X/Y is C(I, 0). 


5.3.6 Some Other Continuous Distributions 


Several other distributions that are related to distributions studied earlier also arise 
in practice. We record briefly some of these and their important characteristics. We 
will use these distributions infrequently. We say that X has a lognormal distribution 
if Y = In X has a normal distribution. The PDF of X is then 


1 er 2 
_< ||, ase 


1 
65 = 
(65) f(x) eR 


and f(x) = 0 for x < 0, where —oo < 4s < 00,0 > 0. In fact for x > 0 


P(X <x)=P(n X <I1n x) 


= 


= PY <in x)= P( 
o oO 


-o(“==*) 
o 


where © is the DF of a (0, 1) RV which easily leads to (65). It is easily seen that 
forn > 0, 


Y—-p wae) 


252 
Bx" ~ exp (ma + ) 


2 
EX = exp ( + =| ,  var(X) = exp(2y + 207) — exp(2u +07). 


(66) 


The MGF of X does not exist. 
We say that the RV X has a Pareto distribution with parameters 6 > 0 anda > 0 
if its PDF is given by 


= 
(67) fa)= pert x0 
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and zero otherwise. Here @ is scale parameter and @ is a shape parameter. It is easy 
to check that 

ge 
(6 + x)®’ 


0 
EX =~—, a>1, and var(X)= 
a-— 


F(x) = P(X <x)=1- x>0 


(68) 02 


(aw — 2)(a@ — 1)? 
for a > 2. The MGF of X does not exist since all moments of X do not. 


Suppose that X has a Pareto distribution with parameters 0 and a. Writing Y = 
In (X/@), we see that Y has PDF 


ae” 


(69) fry) = G+eyeH’ 


—co<y<o, 


and DF 
Fy(y)=1-(+e’)“ forall y. 


The PDF in (69) is known as a logistic distribution. We introduce location and scale 
parameters jz and o by writing Z = uw + oY, taking a = 1, and then the PDF of Z 
is easily seen to be 


_ 1 expe w/el_ 
(70) f2@) =F {1 + exp[(z — 4)/o]}?? 


for all real z. This is the PDF of a logistic RV with location and scale parameters yz 
and o. We leave the reader to check that 


- os ~1 
Fz(z) = exp (=) [i + exp (=*)| 


2-2 
(71) EZ=n,  var(Z)= = 
1 
Mz (1) =expuT(1 on tot), tt < 


Pareto distribution is also related to an exponential distribution. Let X have Pareto 
PDF of the form 


ao® 
(72) Fx(s) = yet!’ x>o 


and zero otherwise. A simple transformation leads to PDF (72) from (67). Then it 
is easily seen that Y = In (X/o) has an exponential distribution with mean 1/a. 
Thus some properties of exponential distribution that are preserved under monotone 
transformations can be derived for Pareto PDF (72) by using the logarithmic trans- 
formation. 
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Some other distributions are related to the gamma distribution. Suppose that X ~ 
G(1, B). Let ¥ = X!/*,a@ > 0. Then Y has PDF 


(73) fr(y) = an exp (=). y>0 


and zero otherwise. The RV Y is said to have a Weibull distribution. We leave the 
reader to show that 


Fry =1— exp (3 ). y>0 


1 
(74) EY" = prep ( : =), EY = p'/ér (: a. =) 
a a 


wen mel(o2)-8(068)} 
a a 


The MGF of Y exists only for a > 1 but fora > 1 it does not have a form useful in 
applications. The special case w = 2, and 6 = 6? is knownas a Rayleigh distribution. 

Suppose that X has a Weibull distribution with PDF (73). Let Y = In X. Then Y 
has DF 


1 
Fy(y) = 1 — exp (-3¢”) ‘ -00 <y < oO. 


Setting @ = (1/a) In B and o = 1/a, we get 


(75) Fy(y) = 1 —exp |- exp =") 
with PDF 
] —6@ —6 
(76) fv(y) = — exp j=" —exp (=) 
oOo oO oO 


for —co < y < co anda > 0. An RV with PDF (76) is called an extreme value 
distribution with location and scale parameters 6 and a. It can be shown that 


ma? 


EY =@—- yo, =, 
(17) yo var(Y) r 


My(t) =e" T(1 + ot) 
where y © 0.577216 is the Euler constant. 


The final distribution we consider is also related to a G(1, B) RV. Let f; be the 
PDF of G(1, 6) and f> the PDF 


1 
fale) = Gop (5) , x <0, =Ootherwise. 
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Clearly, f2 is also an exponential PDF defined on (—ov, 0). Consider the mixture 
PDF 


(78) f@)=HAG)+ fA),  -0o <x <oo. 
Clearly, 

79 foys- (-) —00 <x < 00 
(79) x)= 5 exp a) x 


and the PDF f defined in (79) is called a Laplace or double exponential PDF. It is 
convenient to introduce a location parameter fz and consider instead the PDF 


(80) fis) = 5 exp(-4 5"), —00 < x < 00, 


where —oo < ut < 00, B > 0. It is easy to see that for RV X with PDF (80), we have 
(81) EX=p, var(X)=26", and M(t) =e“[1—(62)"1"', 


for |t| < 1/B. 
For completeness let us define a mixture PDF (PMP). Let g(x|0) be a PDF and 
let h(@) be a mixing PDF. Then the PDF 


(82) fF) = f acsiernc) dé 


is called a mixture density function. If h is a PMF with support set (0), 62,... , O¢}, 
then (82) reduces to a finite mixture density function 


k 


(83) f(x) = D> g@lO)h@). 


i=] 


The quantities 1(6;) are called mixing proportions. The PDF (78) is an example with 
k = 2, h(O) = h(@2) = 5 g(x|91) = fix), and g(x|62) = f2(x). 


PROBLEMS 5.3 


1. Prove Theorem 1. 

2. Let X be an RV with PMF py, = P{X = k} given below. If F is the correspond- 
ing DF, find the distribution of F(X), in the following cases: 

n 


(a) pe = (; 
(b) pr =e (AK /k!),k =0,1,2,...52>0. 


Joho py Kk =0,1,2,... m0 < p< 
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3. 


6 


9. 


10. 


Let Y; ~ U[O, 1], Yo ~ U[0, Yi], ... , Yn ~ UO, Yn-1]. Show that 
Y~ X1, Y2 ~ X,X2, esas Yn ~ X1X2°°-Xn, 


where X;, X2,... , Xn are iid U[0, 1] RVs. If U is the number of Y;, Y2,... , ¥p 
in [t, 1], where 0 < t < 1, show that U has a Poisson distribution with parameter 
— logt. 


Let X1, X2,..., Xp be iid U[O, 1] RVs. Prove by induction or otherwise that 
Sn = )-pa1 Xx has the PDF 
frlx) =f — YT! Do- (jie -by ta", 
k=0 


where €(x) = lifx > 0,=0ifx <0. 


. (a) Let X be an RV with PMF p; = P(X = x;), j =0,1,2,..., and let F be 


the DF of X. Show that 


=0 


EF(X) = = 5( + ¥ 7) 


and 
var F(X) = YS Pigs 35 ( — ¥ 7) 
j=0 j=0 


where qj+1 = Dit Pi- 
(b) Let pj > Ofor j = 0,1,...,N and )-”_, p; = 1. Show that 
Jj ja0 PI 


EF(X) > N+2 
: = 2(N +1) 


with equality if and only if p; = 1/(N + 1) forall j. (Rohatgi [89]) 
Prove (a) Theorem 6 and its corollary, and (b) Theorem 10. 


. Let X be a nonnegative RV of the continuous type, and let Y ~ U(0, X). Also, 


let Z = X — Y. Then the RVs Y and -Z are independent if and only if X is 
G(2, 1/4) for some A > 0. (Lamperti [57]) 


Let X and Y be independent RVs with common PDF f (x) = B~%ax%7! if 0 < 
x < B, and = 0 otherwise; a > 1. Let U = min(X, Y) and V = max(X, Y). 
Find the joint PDF of U and V and the PDF of U + V. Show that U/ V and V 
are independent. 


Prove Theorem 14. 


Prove Theorem 8. 
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11. 
12. 


13. 


14, 


15. 


16. 


17. 


18. 


19. 


20. 
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Prove Theorems 19 and 20. 


Let X,, X2,... , Xn, be independent RVs with X; ~ C(wj, Aj), i = 1,2,...,n. 
Show that the RV X = 1/>77_, X; ! is also a Cauchy RV with parameters 
p/Q2 + pw?) and A/(02 + py”), where 


n i n Li 
A= ———~ and p= ae 

Di iFeat Dea 
Let X1, X2,... , Xp be itd C(1, 0) RVs and a; # 0, bj, i = 1,2,... ,n, be any 
real numbers. Find the distribution of }7j_, 1/(a Xi + bi). 
Suppose that the load of an airplane wing is a random variable X with ’(1000, 
14400) distribution. The maximum load that the wing can withstand is an RV Y, 
which is M(1260, 2500). If X and Y are independent, find the probability that 
the load encountered by the wing is less than its critical load. 
Let X ~ N(O, 1). Find the PDF of Z = 1/X?. If X and Y are iid N‘(O, 1), 
deduce that U = XY//X? + Y? is N(@, 4). 
In Problem {5 let X and Y be independent normal RVs with zero means. Show 
that U = XY//X2 + ¥? is normal. If, in addition, var(X) = var(Y), show that 
V = (X2~-¥?)//X 2+ Y2 is also normal. Moreover, U and V are independent. 
(Shepp [102]) 


Let X1, X2, X3, X4 be independent (0, 1). Show that ¥ = X,X2 + X3Xq has 
the PDF f(y) = 3e7!?!, -00 < y < 00. 


Let X ~ N(15, 16). Find (a) P{X < 12}, (b) P{10 < X < 17}, (©) P{10 < 
X <19| X < 17}, and (d) P{|X — 15] > 0.5}. 


Let X ~ AN(—1,9). Find x such that P{X > x} = 0.38. Also find x such that 
P{|X +1] < x} = 0.4. 


Let X be an RV such that log(X — a) is N(, 02). Show that X has PDF 


_ [loge — a) — nP 


Senos if ; 
f= joa | Ja? | pace 
0 


ifx <a. 


If m1, mz are the first two moments of this distribution and a3 = 43/ Tes is the 
coefficient of skewness, show that a, 4, o are given by 


/ 2 

ma—m 

Pe pee Riise Sa o” = log(1 +7’), 
n 


and 
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21, 


22. 


23. 


25. 


27. 


pw. = log(m) — a) — 50°, 


where 7 is the real root of the equation n° + 3n — a3 = 0. 

Let X ~ G(a, B) and let Y ~ U(O, X). 

(a) Find the PDF of Y. 

(b) Find the conditional PDF of X given Y = y. 

(c) Find P(X + Y < 2). 

Let X and Y be iid N’(0, 1) RVs. Find the PDF of X/|Y|. Also, find the PDF of 
IX|/|¥]. 


It is known that X ~ B(a, B), and P(X < 0.2) = 0.22. Ifa + B = 26, find a 
and f. (Hint: Use Table ST1.) 


. Let X1, X2,... , Xn be iid N (yu, 07) RVs. Find the distribution of 


y= Dear kXk — Mien k 
a= A. one 
(eet &) 


Let Fi, Fo,... , F, ben DFs. Show that min[F (x1), Fo(x2), ... , Fn(v)J is an 
n-dimensional DF with marginal DFs Fj, Fo, ... , Fr. (Kemp [48]) 


. Let X ~ NBC; p) and Y ~ G(1, 1/A). Show that X and Y are related by the 


equation 


P{X <x}= P{Y <[x]} forx > 0, A = log (=) : 


where [x] is the largest integer < x. Equivalently, show that 
P{Y € (n,n + I} = Po{X =n}, 


where 0 = 1 — e™*. (Prochaska [80]) 


Let T be an RV with DF F and write S(t) = 1 — F(t) = P(T > t). The 
function F is called the survival (or reliability) function of X (or DF F). The 
function A(t) = f (t)/S() is called the hazard (or failure-rate) function. For the 
following PDF, find the hazard function: 


(a) Rayleigh: f(t) = (t/a?) exp(—1?/2a2), t>0. 

(b) Lognormal: f(t) = 1/(to /2z) exp[—(In t — )?/207]. 

(c) Pareto: f(t) = w@%/t*t!, ¢ > 6, and = 0 otherwise. 

(d) Weibull: f(t) = (a/B)t%—! exp(—1%/B), t > 0. 

(e) Logistic: f(t) = (1/B) exp[—(t — 4) /B](1 + exp[—(¢ — 4)/BI}~, —00 < 


t< oo. 
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28. Consider the PDF 


A Ve A(x — pw)? 
fay = (545) oo A552). x>0O 


and = 0 otherwise. An RV X with PDF f is said to have an inverse Gaussian 
distribution with parameters jz and A, both positive. Show that 


ie 
EX=p, va(X)=-—-. and 
1/2 
a 2p? 
M(t) = Eexp(tX) = exp { — 1- (1-22) 
Mm 


29. Let f be the PDF of a N(1, 0”) RV. 
(a) For what value of c is the function cf”, n > 0, a PDF? 
(b) Let ® be the DF of Z ~ N(0, 1). Find E[Z®(Z)] and E[Z2(Z)]. 


5.4 BIVARIATE AND MULTIVARIATE NORMAL DISTRIBUTIONS 


In this section we introduce the bivariate and multivariate normal distributions and 
investigate some of their important properties. We note that bivariate analogs:of other 
PDFs are known, but they are not always uniquely identified. For example, there are 
several versions of bivariate exponential PDFs so-called because each has exponen- 
tial marginals. We will not encounter any of these bivariate PDFs in this book. 


Definition 1. A two-dimensional RV (X, Y) is said to have a bivariate normal 
distribution if the joint PDF is of the form 


1 Zz 
(1) f (x,y) = — ee > ~OO<x<W, -W< y < 00, 
101624/ 1 — p2 


where 0; > 0, 02 > 0, |p| < 1, and @Q is the positive definite quadratic form 


1 me Sete ee: 
(2) Q(x, y)= (5") Py eoai aa m+ (2 Ht) | 
real 02 


1 — p2 o1 02 


Figure 1 gives graphs of bivariate normal PDF for selected values of p. 
We first show that (1) indeed defines a joint PDF. In fact, we prove the following 
result. 


Theorem 1. The function defined by (1) and (2) with o, > 0, 02 > 0, |p| < 1 
is a joint PDF. The marginal PDFs of X and Y are, respectively, NV (111, o?) and 
N (p22, 03), and p is the correlation coefficient between X and Y. 


"60'S 0'S'0— ‘6'0- = 9 pur] = to = !o ‘9 = tl = I yim [eu arena “T “Sty 
(9) (D) 
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‘60 'S'0'S'0- ‘6'0- 
(P) 


o pure ‘| = 20 = Io ‘g = Ur! = If YM JeUIOU aJeURAIG “(panuiUor) *T ‘Sty 


(9) 
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Proof. Let fi) = / f(, y) dy. Note that 


7” ee: ane: 
(t — p?)O(x, y) = ‘Gam ~ pratt) +(1—p*) (=*) 


02 O71 


_ y — [2 + p(o2/o1)@ — M1)I | +(1—p) ( THI ). 
a1 


02 


It follows that 


Wijee Ege ae in exp{—(y ~ Bx)?/207(1 — p7)1} os 
ow 2r- 20? —00 a2 1 — p2J/2n ; 
(3) 


where we have written 


(4) Be = Ha + pe (x — 1). 
0} 


The integrand is the PDF of an N(Bz, o7(1 — p”)) RV, so that 


fi) : e (8) oo <x < 00 
x)= xp|—- ; = < OO. 
: o\V2n Pr 2 o1 


[-[ fi rena] ax=[ filsde=1, 


and f(x, y) is a joint PDF of two RVs of the continuous type. It also follows that f; 
is the marginal PDF of X, so that X is VN (421, o?). In a similar manner we can show 
that Y is N (12, 0). 

Furthermore, we have 


Thus 


(5) f@y) J Pi (isa 2 
fi@) — oJ/l—p2Vix | 2071 — p) |’ 


where f, is given by (4). It is clear, then, that the conditional PDF fy)x(y | x) given 
by (5) is also normal, with parameters B, and o3( 1 — p”). We have 


(6) BAY |x) = Br = ma + oe ~ a) 


and 


(7) var{Y |x} = o7(1 — p?). 
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In order to show that p¢ is the correlation coefficient between X and Y, it suffices 
to show that cov(X, Y) = p0102. We have from (6) 


E(XY) = E{E{XY|X}} 
02 
=E {x E + p(X — wo] 
a1 


Po2 
= ipa + —of. 
Ol 


It follows that 
cov(X, Y) = E(XY) — pi p2 = pojo2. 


Remark 1. If p” = 1, then (1) becomes meaningless. But in that case we know 
(Theorem 4.5.1) that there exist constants a and b such that P{Y = aX + b} = 1. 
We thus have a univariate distribution, which is called the bivariate degenerate (or 
singular) normal distribution. The bivariate degenerate normal distribution does not 
have a PDF but corresponds to an RV (X, Y) whose marginal distributions are normal 
or degenerate and are such that (X, Y) falls on a fixed line with probability 1. It is for 
this reason that degenerate distributions are considered as normal distributions with 
variance 0. 


Next we compute the MGF M (fy, t2) of a bivariate normal RV (X, Y). If f(@, y) 
is the PDF given in (1) and fj is the marginal PDF of X, we have 


OO fo @) 
M(t, t2) = i / ell +2Y F(x, y) dx dy, 
—o0 J —00 


-| [/ frix& | ne” dy| e"'* fi(x) dx 
—0o 


eae L233 2 02 
-| efile) exp] 5028 (1 — po") +2 ioe = Hs) dx 


—0O 


1 fom) o 
= exp Face — p?) + typ. - pau | i ell¥ ePa2/OE2 F (x) dx. 


—oO 


Now 


ee 0: 1 02\? 
ll eli tpnm/* F(x) dx = exp E (« + pan) +351 (0 - pn@) ) 
ogy fon] 2 0} 


Therefore, 


2,2 2,2 
oft, + ast; + 2pojontyte 
(8) M(t, t2) = exp (1 fia ch  e e 


2 
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The following result is an immediate consequence of (8). 


Theorem 2. If (X, Y) has a bivariate normal distribution, X and Y are indepen- 
dent if and only if o = 0. 


Remark 2. It is quite possible for an RV (X, Y) to have a bivariate density such 
that the marginal densities of X and Y are normal and the correlation coefficient is 
0, yet X and Y are not independent. Indeed, if the marginal densities of X and Y are 
normal, it does not follow that the joint density of (X, Y) is a bivariate normal. Let 


1 1 —1 
(9) f@, y= 3 {scam pm | aq yO? — Bory +94] 


—-— ——_ 2 : 
+ oar | sont + xy +99| 


Here f(x, y) is a joint PDF such that both marginal densities are normal, f(x, y) 
is not bivariate normal, and X and Y have zero correlation. But X and Y are not 
independent. We have 


1 2 
AiG) = etl? —-00 <x <0, 
J2n 
1 
f(y) = ame” —00 < y < 00, 
a 
and 
EXY =0. 


Example I (Rosenberg [91]). Let f and g be PDFs with corresponding DFs F 
and G. Also, let 


(10) h(x, y) = f@) sll + e2F) — DAG) — DI, 


where |a| < 1 is a constant. It was shown in Example 4.3.1 that A is a bivariate 
density function with given marginal densities f and g. 
In particular, take f and g to be the PDF of \/(0, 1), that is, 


ey 4 
ere, —0O <x < 00, 


1 
(1) f(x) = 8%) = 
V2 
and let (X, Y) have the joint PDF A(x, y). We will show that X + Y is not normal 
except in the trivial case a = 0, when X and Y are independent. 

Let Z = X + Y. Then 


EZ =0, var(Z) = var(X) + var(Y) + 2cov(X, Y). 
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It is easy to show (Problem 2) that cov(X, Y) = a/z, so that var(Z) = 2[1+(a@/7)]. 
If Z is normal, its MGF must be 


(12) M,(t) = ef + @/, 
Next we compute the MGF of Z directly from the joint PDF (10). We have 
Mi(t) _, E{e*+¥} 
co oo 
ere / / e912 F(x) — 1I2F(y) — Nf Go) f(y) dx dy 
—00 4¥—CO 
2 ad 2 
=e! +e} f e"2FG) — Usa)ae} : 
—0O 
Now 
foe) co 2 
/ e[2F (x) — I] f(x) dx = -2 f e* [1 — F(x) f(x) dx +e! 7 
~0o —00 


be ae as ne 2 [x = exp| [-4? + u2 ~2¢x)| dudx 


oo EXP {-50 + (tx)? - 2x1} 


272 ad 
pie fe 
$66 0 Ls 
fore) were? _ 4)\2 (o.) as _ 2 
= -{ exp[—v*/2 + (v — t)/4] exp{—[x + (v — t)/2] Mga 
0 Va —co Jt 
1 2 
exp. —5[(v + t)°/2 
a xp {Heo + 07/21) 
0 2/4 
_ oP /2 17/2 | t 
=e‘ — Je /*P{Z, > =}, 
/2 
(13) 
where Z, is an N(0, 1) RV. 
It follows that 
(14) Mi(t) =e" + (-"" 2p {z : ly 
=e aye — ze >-—_ 
1 1 Ya 


- jr+0( ~2P {2 > +}) | 
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If Z were normally distributed, we must have M,(t) = M,(¢) for all ¢ and all 
|a| < 1, that is, 


2 
t 
15 eh elm —. of 1+a(1-2P|z > <1) ; 
(15) 1 Fi 


For a = 0, the equality clearly holds. The expression within the brackets on the right 
side of (15) is bounded by 1 + a, whereas the expression e/7)! : is unbounded, so 
the equality cannot hold for all t and a. 


Next we investigate the multivariate normal distribution of dimension n,n > 2. 
Let M be ann x n real, symmetric, and positive definite matrix. Let x denote the 
n x 1 column vector of real numbers (x1, x2, ... , Xn)’, and let zz denote the column 
vector (j41, 42,... , Un)’, where w;(i = 1,2,... , m) are real constants. 


Theorem 3. The nonnegative function 


— pYM(x — 
f(x) = cexp|-S— WACO) —0 <x<0o, i=1,2,...,n, 


(16) 


defines the joint PDF of some random vector X = (X1, X2,... , Xn)’, provided. that 
the constant c is chosen appropriately. The MGF of X exists and is given by 


tM't 
(17) Mert t) =e09 (t+ 2 ) 
where t = (tj, f2,... ,%)/ and t), ta, ... , t, are arbitrary real numbers. 
Proof. Let 
co foe) is ™ ae n 
(18) rec f af exp | tx SWOT] ae 
—oo Pon 2 i=l 
Changing the variables of integration to yi, y2,... , yn by writing x; - ui = y;, 
i=1,2,...,mandy = (yq, y2,.... yn)’, we have x — ys = y and 
co fo 2} 'M n 
(19) l= cexpttn) f of exp (vy ~ oe") I] dy;. 
700 =e 2 i=1 
Since M is positive definite, it follows that all the n characteristic roots of M, say 
m1,m2,... , Mn, are positive. Moreover, since M is symmetric, there exists ann xn 


orthogonal matrix L such that L’ML is a diagonal matrix with diagonal elements 
m1,™m2,...,m,. Let us change the variables to z1, z2,... , Z, by writing y = Lz, 
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where z’ = (z1,22,--. Zn), and note that the Jacobian of this orthogonal transfor- 
mation is |L|. Since L’L = I,, where I, is ann x n unit matrix, |L| = 1 and we 
have 


CO CO a 'MLL n 
(20) I = cexp(t'p) me exp ( (Lz — lanai I] dz. 
2 
—oo oo i=] 


If we write tL = u! = (uj, u2,..., un), then (Lz = 77, 4iz;. Also, L'ML = 
diag(m1,mz,..., mn), so that z’L'MLz = )77_, m2? The integral in (20) can 
therefore be written as 


Hy fooe(on- 2) ] TLV (Ga) 


If follows that 
, (22/2 n uz 

21 l= ta) ——____—_~ aire 
(21) c exp( a ESET exp Do mn 
Setting 1) = t) = --- = t = 0, we see from (18) and (21) that 

fe is 2 n/2 

| vf (Gin Rie ee 

ies (mjm2---m,)}/ 
By choosing 
(22) ” (mym2-++mn)'/2 


(27)n/2 


we see that £ is a joint PDF of some random vector X, as asserted. 
Finally, since 


(L/ML)~! = diag(m{',mz!,....m;'), 


we have 
n u2 
yo =a M"'Dy = Mt. 
i=l] mij 


Also, 
IM~!} = |L’M'L| = Gym --- my). 
It follows from (21) and (22) that the MGF of X is given by (17), and we may write 


(23) c= [@x)"\m—"]]72° 


This completes the proof of Theorem 3. 
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Let us write M~! = ((0ij))i, j=1,2,....n- Then 


2 
t? 
M(O,0,... ,0,,0,... = 000i +o‘) 


is the MGF of X;,i = 1,2, ... ,n. Thus each X; is N(;, 041),i = 1,2,... ,n. For 
i % j, we have for the MGF of X; and X; 

M(0,0,...,0,1;,0,... ,0,4;,0,... , 0) 
Ze aes) 


= exp (rises 5) 


This is the MGF of a bivariate normal distribution with means j1;, 42;, variances o;;, 
oj;, and covariance o;;. Thus we see that 


(24) H = (11, M2... Bn) 

is the mean vector of X’ = (X1,... , Xn), 

(25) oy = 07 =var(X;), i=1,2,...,n, 
and 

(26) Gi; = PijOi9;, iAj; i,f7=1,2,...,n. 


The matrix M~! is called the dispersion (variance—covariance) matrix of the multi- 
variate normal distribution. 

If oj; = Ofori # j, the matrix M~! is a diagonal matrix, and it follows that 
the RVs X,, X2,..., Xn are independent. Thus we have the following analog of 
Theorem 2. 


Theorem 4. The components X;, X2,..., Xp, of a jointly normally distributed 
RV X are independent if and only if the covariances oj; = O for alli 4 j (i,j = 
125 cist): 


The following result is stated without proof. The proof is similar to the two-variate 
case except that now we consider the quadratic form inn variables: E oye1 t)(Xj - 


wid)? >= 0. 


Theorem 5. The probability that the RVs X1, X2,..., Xn with finite variances 
satisfy at least one linear relationship is | if and only if |[M| = 0. 

Accordingly, if |M| = 0, all the probability mass is concentrated on a hyperplane 
of dimension < 7. 
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Theorem 6. Let (X;, X2,... , X,) be an n-dimensional RV with a normal dis- 
tribution. Let Y;, Yo,.... ¥x.k < n, be linear functions of X; (j = 1,2,...,n). 
Then (Y1, Y2,... , Y;) also has a multivariate normal distribution. 


Proof. Without loss of generality let us assume that EX; = 0,i = 1,2,... ,n. 
Let 


n 
(27) Be Ake: Hla ke bem 
j=l 
Then EY, = 0, p=1,2,...,k, and 
n 
(28) cov(Yp, Yq) = > ApiAgiaij, 
i,j=l 


where E(X;X;) = o7j,1, 7 =1,2,...,n. 
The MGEF of (11, Y2,... , Yx) is given by 


n n 
M*(t,0,...,Q)=E ex (1 So au | : 
j=l j=l 


Writing uj = yi tpApj, j = 1,2,...,n, we have 


(29) M*(t,1,...,%) = Ele (Soax)| 
i=l 
] n 
= exp (; > ai) by (17) 


ij=l 


1 n k 
= exp (; Ss Oi; \ tind bn 


i,j=l i,m=1 


1 k n 
= exp (3 ye Utm > suns) 
lm= i 


m= i,j=l 
1 k 
= exp 5 > titm COV(Y1, Ym) | - 
Im=1 


When (17) and (29) are compared, the result follows. 


Corollary 1. Every marginal distribution of an n-dimensional normal distribu- 
tion is univariate normal. Moreover, any linear function of X1, X2,..., Xp is uni- 
variate normal. 
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Corollary 2. If X;, X2,..., Xn are tid N (uz, o*), and A is ann x n orthog- 
onal transformation matrix, the components Yj, Y2,..., Yn of Y = AX’, where 


X = (X\,..., Xn)’, are independent RVs, each normally distributed with the same 
variance ao”. 
We have from (27) and (28) 


n 
cov(Yp, Y,) = > Api AgiSii + > Api Aqj%ij 
j=l iFj 


_fo itp#a, 
~ lo ifp=4q, 


since )*j_, Api Agi = O and )j_y Abi = 1. It follows that 


1 n 
M*(t1, t2,... ,t,) = exp (; Se?) : 
{=1 


and Corollary 2 follows. 


Theorem 7. Let X = (X1, X2,... , X,)’. Then X has an n-dimensional normal 
distribution if and only if every linear function of X, 


X’t = 1X, + Xo +--+ + Xn 
has a univariate normal distribution. 


Proof. Suppose that X’t is normal for any t. Then the MGF of X’t is given by 


(30) M(s) = exp (bs + }0°s*). 


Here b = E{X't} = )'[timi = tie, where p! = (wi,..., Mn), and o? = 


var(X’t) = var()>t;X;) = t’M~!t, where M~! is the dispersion matrix of X. Thus 
(31) M(s) = exp (t'us + 51M 'ts?). 

Let s = 1; then 

(32) M(1) = exp (ty + stM~'t) 


and since the MGF is unique, it follows that X has a multivariate normal distribution. 
The converse follows from Corollary 1 to Theorem 6. 


Many characterization results for the multivariate normal distribution are now 
available. We refer the reader to Lukacs and Laha [67, p. 79]. 
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PROBLEMS 5.4 


1. Let (X, Y) have joint PDF 


8fx2 31 xy y* 4  =T71 
f@ y= —Rm 0 |-3(3- 3 245-48) 


for -00 < x < 00, —0€0 < y < CO. 
(a) Find the means and variances of X and Y. Also find p. 
(b) Find the conditional PDF of Y given X = x and E{Y|x}, var{Y|x}. 
(c) Find P{4 < Y < 6|X = 4}. 
2. In Example |, show that cov(X, Y) = a/z. 


3. Let (X, Y) be a bivariate normal RV with parameters 141, (22, o;, of, and p. 
What is the distribution of X + Y? Compare your result with that of Example 1. 


4. Let (X, Y) be a bivariate normal RV with parameters 21, 42, a7, 03, and p, and 
let U = aX +b,a #£0,and V =cY +d,c £ 0. Find the joint distribution of 
(U,V). 


5. Let (X, Y) be a bivariate normal RV with parameters wy = 5, 2 = 8, a? = 16, 
oa} = 9, and p = 0.6. Find P{S < Y < 11| X = 2}. 


6. Let X and Y be jointly normal with means 0. Also, let 
W = Xcos@+Y sin6, Z = Xcos@ —Ysiné. 


Find @ such that W and Z are independent. 


7. Let (X, Y) be a normal RV with parameters ji, 42, a}, oF, and p. Find a nec- 
essary and sufficient condition for X + Y and X — Y to be independent. 


8. For a bivariate normal RV with parameters 21, (42, 01, 02, and p show that 


1 1 
P(X > m1, ¥ > wa)= 4+ 5— tan! £ 


2n J1—p2. 


[Hint: The required probability is P((X — u1)/o1 > 0, (Y — u2)/o2 > 0). 
Change to polar coordinates and integrate.] 


9. Show that every variance-covariance matrix is symmetric positive semidefinite 
and conversely. If the variance—covariance matrix is not positive definite, then 
with probability 1 the random (column) vector X lies in some hyperplane e’X = 
awithe £0. 


10. Let (X, Y) be a bivariate normal RV with EX = EY = 0, var(X) = var(Y) = 1, 
and cov(X, Y) = p. Show that the RV Z = Y/X has a Cauchy distribution. 
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11. (a) Show that 


is a joint PDF on Rp. 
(b) Let (X1, X2,..., Xn) have PDF f given in (a). Show that the RVs in any 
proper subset of {X1, X2,... , Xn} containing two or more elements are 


independent standard normal RVs. 


5.5 EXPONENTIAL FAMILY OF DISTRIBUTIONS 


Most of the distributions that we have so far encountered belong to a general family 
of distributions that we now study. Let © be an interval on the real line, and let 
{fe : @ € ©} be a family of PDFs (PMFs). Here and in what follows we write 
X == (X1,x2,... , Xn) unless otherwise specified. 


Definition 1. If there exist real-valued functions Q(6) and D(@) on © and Borel- 
measurable functions T (x,, x2, ... , Xn) and S(x1, x2,... ,X,) on R, such that 


(1) fo (x1, x2,... , Xn) = exp[Q@)T (x) + D@) + S@)], 
we say that the family { fg, € ©} is a one-parameter exponential family. 


Let X1, X2,... , Xm be iid with PMF (PDF) fg. Then the joint distribution of 
X = (X1, X,... , Xm) is given by 


go(x) = | | fox) = [ [explQ@)7 (x) + D@) + S0%)) 


m 
i=] i=! 


= exp |ow >> TH) +mDO)+ >> sc] 


i=] i=l 


where X = (X1,X2,...,Xm), Xj = (4j1,%j2,--.,Xjn), J = 1,2,...,m, and it 
follows that {gg : @ € ©} is again a one-parameter exponential family. 


Example I. Let X ~ No, 0”), where jp is known and o* unknown. Then 


1 ne _@& — po)? 
oV2n 202 


= 2 
= exp - log(o V2) _ oe 


fo2(x) = 


oz 
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is a one-parameter exponential family with 


Q(o”) = ao T (x) = (x — uo)’, S(x) = 0, and 
D(a’) =— log(o V2r). 


If X ~ N(u, 02), where op is known but is unknown, then 


1 (x — p)? 
fu) = SN edd ea 


x2 px we 


1 
= ——— exp} -—5 + ya, 
oov 20 E ( 20? ae 20 
is a one-parameter exponential family with 


QWw=5, Dw=-4, Ta)=x, 
% 


2 + 
2069 


and 
oa 1 
S(x) = Ee + Fiog@xop)| P 


Example 2. Let X ~ P(A), > 0 unknown. Then 


x 
P{X =x} = a = exp[—A + x loga — log(x})], 
and we see that the family of Poisson PMFs with parameter 1 is a one-parameter 
exponential family. 

Some other important examples of one-parameter exponential families are bino- 
mial, G(a, 8) (provided that one of a, £ is fixed), B(a, 6) (provided that one of a, B 
is fixed), negative binomial, and geometric. The Cauchy family of densities and the 
uniform distribution on [0, 6] do not belong to this class. 


Theorem 1. Let { fo: @ € ©} be a one-parameter exponential family of PDFs 
(PMFs) given in (1). Then the family of distributions of T (X) is also a one-parameter 
exponential family of PDFs (PMFs), given by 

go(t) = exp[fQ() + DO) + S*()] 


for suitable S*(t). 
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The proof of Theorem 1 is a simple application of the transformation of vari- 
ables technique studied in Section 4.4 and is left as an exercise, at least for the cases 
considered in Section 4.4. For the general case we refer to Lehmann [63, p. 58]. 

Let us now consider the k-parameter exponential family, k > 2. Let© C R, bea 
k-dimensional interval. 


Definition 2. If there exist real-valued functions Q;, Q2,... , Qg, D defined on 
©, and Borel-measurable functions 7), 72,... , 7%, S on Ry, such that 


k 
(2) fo(x) = exp I> Qi (0)T;(x) + D(0) + sa| 
i=] 
we say that the family { fg, @ € ©) is a k-parameter exponential family. 


Once again, if X = (Xi, X2,... , Xm) and Xj are iid with common distribution 
(2), the joint distributions of X form a k-parameter exponential family. An analog of 
Theorem 1 also holds for the k-parameter exponential family. 


Example 3. The most important example of a k-parameter exponential family is 
N(u, 07) when both yz and o? are unknown. We have 


@=(u,07), O={(u,07):—-0o <p <0o,07 > 0} 
and 
1 x? ~ 2px + pe? 
Ffo(x) = | 202 


2 2 
a x yu 1] pu 2 
~exp| 2 + on 3 E + logQa0 || . 
It follows that fg is a two-parameter exponential family with 


1 
Q1(8) = ~—> 


sr OM =5, N=, G)=x, 


1} pe? 2 
D(@O) = -= eae log(2m0“)}, and S{x)=0. 
2\/oa 


Other examples are the G(a, 8) and B(a, 8) distributions when both a, 6 are 
unknown, and the multinomial distribution. U[a, B] does not belong to this family, 
nor does C(a, B). 
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Some general properties of exponential families will be studied in Chapter 8, and 
the importance of these families will then become evident. 


Remark I. The form in (2) is not unique, as easily seen by substituting a Q; for 
Q; and (1/a)T; for T;. This, however, is not going to be a problem in statistical 
considerations. 


Remark 2. The integer k in Definition 2 is also not unique since the family 
{1, Q1,..., Qk} or (1, 71, ... , T} may be linearly dependent. In general, k need 
not be the dimension of ©. 


Remark 3. The support {x : f(x) > 0} does not depend on @. 


Remark 4. In (2), one can change parameters to n; = Q;(0),i = 1,2,...,k, 
so that 


k 
(3) fn(X) = exp b mTi(x) + Dim) +S «| 


i=] 


where the parameters 9 = (71, 72, .-- , Nx) are called natural parameters. Again, nj 
may be linearly dependent so that one of n; may be eliminated. 


PROBLEMS 5.5 


1. Show that the following families of distributions are one-parameter exponential 
families: 


(a) X ~ b(n, p). 

(b) X ~ G(a, B), (@) if a is known, and (ii) if B is known. 
(c) X ~ Bia, B), (i) if @ is known, and (ii) if 8 is known. 
(d) X ~ NB(r; p), where r is known, p unknown. 


2. Let X ~ C(1, @). Show that the family of distributions of X is not a one-parameter 
exponential family. 


3. Let X ~ U[0, 6], 6 € [0, 00). Show that the family of distributions of X is not an 
exponential family. 


4. Is the family of PDFs 
fe) = de Al, —00 < x < 00,0 E (—0W, 00), 


an exponential family? 


EXPONENTIAL FAMILY OF DISTRIBUTIONS 255 


5. Show that the following families of distributions are two-parameter exponential 
families: 
(a) X ~ G(a, B), both a and B unknown. 
(b) X ~ B(a, B), both a and 6 unknown. 


6. Show that the families of distributions U[a, 8] and C(a, B) do not belong to the 
exponential families. 


7. Show that the multinomial distributions form an exponential family. 


CHAPTER 6 


Limit Theorems 


6.1 INTRODUCTION 


In this chapter we investigate convergence properties of sequences of random vari- 
ables. The three limit results proved here, namely, the two laws of large numbers and 
the central limit theorem, are of considerable importance in the study of probability 
and statistics. Just as in analysis, we distinguish among several types of convergence. 
The various modes of convergence are introduced in Section 6.2. Sections 6.3 and 
6.4 deal with the laws of large numbers, and the central limit theorem is proved in 
Section 6.6. 

The reader may find some parts of this chapter difficult, at least on first reading. 
These have been identified with a dagger (+) and include the concept of almost sure 
convergence (Section 6.2) and the strong law of large numbers (Section 6.4). Since 
the central limit result is basic and will be used repeatedly in the rest of the book, it 
is important for readers to familiarize themselves with this result and its application 
and to understand its significance. Similarly, on the first reading it will suffice to 
know the strong law of large numbers and to understand its significance. 


6.2 MODES OF CONVERGENCE 


In this section we consider several modes of convergence and investigate their inter- 
relationships. We begin with the weakest mode. 


Definition 1. Let {F,,} be a sequence of distribution functions. If there exists a 
DF F such that as n > 00, 


Q) F,(x) > F(x) 


at every point x at which F is continuous, we say that F, converges in law (or, 
weakly), to F, and we write F,, S F. 

If {X,} is a sequence of RVs and {F,,} is the corresponding sequence of DFs, we 
say that X,, converges in distribution (or law) to X if there exists an RV X with DF 


F such that F, = F. We write X;, iat x. 
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It must be remembered that it is quite possible for a given sequence of DFs to 
converge to a function that is not a DE. 


Example 1. Consider the sequence of DFs 


0, x<n, 
Fy(x) = 1 a 


Here F;,(x) is the DF of the RV X,, degenerate at x = n. We see that F, (x) converges 
to a function F that is identically equal to 0, and hence is not a DE. 


Example 2. Let X,, X2,... , Xn be iid RVs with common density function, 


— 


-, O<x <@, 0 <8 < oo), 
f(x) = 40 ¢ ) 
0, otherwise. 


Let X(n) = max(X1, X2,... , Xn). Then the density function of X im) is 


nx"! 
Away ge YSe =O 
0, otherwise, 
and the DF of X(q) is 
0, x <0, 
F(x) = { (x /0)", O<x <6, 
1, x>6. 


We see that as 2 — oo, 


Fy(x) > F(x) = 7 


which is a DE. Thus F, —> F. 


The following example shows that convergence in distribution does not imply 
convergence of moments. 


Example 3. Let F,, be a sequence of DFs defined by 


0, x <0, 
1 
F,@) = se O<x <n, 


1, n<x. 
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Clearly, F, —> F, where F is the DF given by 


0, x <Q, 


F(x) = 
(x) i x>0. 
Note that F,, is the DF of the RV X,, with PMF 


1 


Pete P{X, =n}=-, 
n n 


and F is the DF of the RV X degenerate at 0. We have 


1 
EX = nk (=) = ave! 
n 


where k is a positive integer. Also, EX* = 0, so that 
EX* » EX*® — foranyk> 1. 


We next give an example to show that weak convergence of distribution functions 
does not imply the convergence of corresponding PMFs or PDFs. 


Example 4. Let {X,} be a sequence of RVs with PMF 


1 
1, ifx=2+-, 
fn) = P{X, = x} = n 
0, otherwise. 
Note that none of the f,,’s assigns any probability to the point x = 2. It follows that 
fn(x) > f(x) as n> 00, 


where f(x) = 0 for all x. However, the sequence of DFs {F,,} of RVs X,, converges 
to the function 


0, x <2, 
1 x >2, 


F(x) = | 


at all continuity points of F. Since F is the DF of the RV degenerate at x = 2, 
Fy —> F. 


The following result is easy to prove. 
Theorem 1. Let X, be a sequence of integer-valued RVs. Also, let f,(k) = 


P{X, =kj,k =0,1,2,..., be the PMF of X,,1 = 1,2,..., and f(k) = P{X = 
k} be the PMF of X. Then 


falx) > f(x) forallx &X, 5X. 
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In the continuous case we state the following result of Scheffé [98] without proof. 
Theorem 2. Let X,,1 = 1,2,..., and X be continuous RVs such that 


Fr) > Ff) for (almost) all x asn —> oo. 


Here, f, and f are the PDFs of X,, and X, respectively. Then X,, a : 


The following result is easy to establish. 


Theorem 3. Let {X,,} be a sequence of RVs such that X, ba X, and let c be a 
constant. Then 


(a) Ho dey X +c, and 

(b) CX, > cX,c £0. 

A slightly stronger concept of convergence is defined by convergence in proba- 
bility. 


Definition 2. Let {X,,} be a sequence of RVs defined on some probability space 
(Q2,S, P). We say that the sequence {X,,} converges in probability to the RV X if 
for every € > 0, 


(2) P{|X, — X|>e} > 0 as in —> 0O. 


We write X, x X. 


Remark 1. We emphasize that the definition says nothing about the convergence 
of the RVs X,, to the RV X in the sense in which it is understood in real analysis. 


Thus X, zie X does not imply that given ¢ > 0, we can find an N such that |X, — 
X| < e forn > N. Definition 2 speaks only of the convergence of the sequence of 
probabilities P{|X, — X| > €} to 0. 


Example 5. Let {X,} be a sequence of RVs with PMF 


1 1 
P{X,=l}=—-, and P{X, =0}=1—-. 
n n 
Then 
P{X aaped if 0 1 
P{IXn| > e} = AS Ge Se 
0 ife > 1. 


It follows that P{|X,| > ¢} — 0asn — ov, and we conclude that X,, 4 0. 


260 


LIMIT THEOREMS 


The truth of the following statements can easily be verified. 


Xess er i eS 0. 


Nn Nn 


11. 


X_ > X,Xq 5 ¥ > P(X =Y} = 1 for P(|X—Y¥| > c} < P{|X,—X| > 


c/2}+ P{|Xn —Y| > c/2}, and it follows that P{|X — Y| > c} = 0 for every 
c>0. 


2 Ny oe KS Ky = Xq Se Vas nt > 00 for 


P(IKn — Xml > 0} <P [IXn—X1> 5} +P [IXm—X1> 5}. 


Xn X,%, PY SX +Y, BXHY. 
F X, ~» X, k constant, => kX, FRX. 
X, oko X24 22. 


P P P 
. Xp, — a, Y, — b,a, b constants > X,Y, — ab, for 


(Xn + Yn)? ~ (Xn = Yn)? Pp (a+b)? ~(a—b) _ 


a z ab. 


XnYn = 


_X_—> 1 X71 4 1 for 


and each of the three terms on the right goes to 0 as n — oo. 


. Xn ~> a, Yq —> ba, b constants, b £0 => X,¥7! > ab. 
he. x X,and Y anRV => X,Y af XY. Note that Y is an RV, so that given 


5 > O, there exists ak > O such that P{|¥| > k} < 6/2. Thus 
P{|Xn¥ — XY| > e} = P{|X,n — X|¥| > ©, |¥| > k} 
+ P{|Xn — XY] > 6, {¥| < k} 


<5 +P (Ixa—X1> 2). 


X,>X,Y%5Y3X,Y, 5 XY, for 


(Xn — X)(¥, —Y) © 0. 
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The result now follows on multiplication, using result 10. It also follows that 
X, > X= X24 x2, 


Theorem 4. Let X;, x , and g be a continuous function defined on R. Then 
g(Xn) > g(X) asin > 00. 


Proof. Since X is an RV, we can, given ¢ > 0, find a constant k = k(e) such 
that 


é 
P{IX| >k} <x. 
iX|>k} <5 


Also, g is continuous on R, so that g is uniformly continuous on [—k, k]. It follows 
that there exists a 6 = 5(e, k) such that 


Ig(xn) — g(x)| <é 

whenever |x| < k and |x, — x| < 4. Let 

A = {|X| < ky}, B= (|Xn—X| <6}, C= {lg(Xn) — 8(X)| < ¢}. 
Then a € AN B > w € C, so that 

ANBCC. 
It follows that 
P{C*) < P{A°) + P{B}, 
that is, 
P{lg(Xn) — g(X)| = €} < Pi|Xn — X| = 8} + PIX] > k} <e 
forn > N(e,6,k), where N(e, 5, k) is chosen so that 
PUIXn — X| 28) <5 forn > N(e,5,k). 

Corollary 1. X, & c, where c is a constant > g9(Xn) & g(c), g being a 

continuous function. 


We remark that a more general result than Theorem 4 is true and state it without 


proof (see Rao [86, p. 124]): X, Ee X, and g continuous on R => g(X,) fee g(X). 
The following two theorems explain the relationship between weak convergence 
and convergence in probability. 
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Theorem 5. X, > X > X, > X. 
Proof. Let F, and F, respectively, be the DFs of X,, and X. We have 


{w: X(@) < x’} = {w: X,(w) < x, X(@) < x} U {@: X,(w) > x, X(w) < x} 
C {X_, <x}U{X, > x, X <x’). 


It follows that 


F(x’) < Fy(x) + P{X, >x,X <x’. 


: P 
Since X, — X — 0, we have for x’ < x, 


P{X, > x, X <x'} < P{|X, —X|>x-x}>0 as noo. 
Therefore, 
F(x’) < lim Fy(x), x’ <x. 
n->+Co 


Similarly, by interchanging X and X,, and x and x’, we get 
lim F,(x) < F(x”), x< x", 
n->oco 

Thus, for x’ < x < x”, we have 


F(x’) < lim F,(x) < lim F,(x) < F(x”). 


Since F has only a countable number of discontinuity points, we choose x to be a 
point of continuity of F, and letting x” | x and x’ + x, we have 


F(x) = lim Fa) 
at all points of continuity of F. 
Theorem 6. Let k be a constant. Then 
Xn > k => Xp > k. 
The proof is left as an exercise. 


Corollary. Let k be a constant. Then 


Xn kh O Xp ok. 


MODES OF CONVERGENCE 263 


Remark 2. We emphasize that we cannot improve the result above by replac- 
ing k by an RV; that is, X, aX , in general, does not imply Xp, 4 X, for let 
X, X1, X2,... be identically distributed RVs, and let the joint distribution of (X,, X) 
be as follows: 


Clearly, Xp, 4x. But 


P{|X, —X|> 4} = P(\Xn—-X|=1) 
= P{X, =0,X =1}+ P{X, =1,X =0} 
=1-+0. 


Hence, Xn £ X, but X, & XxX. 


Remark 3. Example 3 shows that X, > X does not imply that EX! > EX* 
for any k > 0, k integral. 


Definition 3. Let {X,} be a sequence of RVs such that E|X,|" < oo for some 
r > 0. We say that X,, converges in the rth mean to an RV X if E|X|" < co and 


(3) E|X, —X|' ~ 0 as n— oO, 
and we write X, Ente 


Example 6. Let {Xn} be a sequence of RVs defined by 


’ n=1,2,.... 


sie 


1 
Pike Op ho P{X, = = 
Then 


1 
E|X,|? =- > 0 as n—> OO, 
n 


and we see that X, ae X, where RV X is degenerate at 0. 


Theorem 7. Let X,, ~> X for some r > 0. Then Xn Be X. 
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The proof is left as an exercise. 


Example 7. Let {X,,} be a sequence of RVs defined by 


1 1 
P{X, =O}=1-—, and P{X, =n} =—, r>0, n=1,2,.... 
n nl 


Then E|X,|" = 1, so that X, -» 0. We show that X, —> 0. 


P{|X,| >} = P{X, =n} if e<n 


. —>0 asn — oo. 
0 if a) 


Theorem 8. Let {X,,} be a sequence of RVs such that X, EH X. Then EX, —> 
EX.and EX? + EX* asn — oo. 

Proof. We have 

|E(Xn — X)| < E|Xn-—X| < E'?|X,-XP +90  asn >on. 
To see that EX? + EX? (see also Theorem 9), we write 
EX? = E(X, — X)? + EX? +2 E{X(Xn — X)} 
and note that 
|E{X (Xn — X)}| < VEX7E(Xn — X)* 


by the Cauchy—Schwarz inequality. The result follows on passing to the limits. 
We get, in addition, that X, > X implies that var(X,) — var(X). 


Corollary. Let {Xm}, {¥,} be two sequences of RVs such that Xm 2 X, 
Y, 2> Y. Then E(XmY¥n) > E(XY) as m,n —> 00. 


The proof is left to the reader. 
As a simple consequence of Theorem 8 and its corollary we see that X,, —> X, 


Y, > Y together imply that cov(Xm, Yn) > cov(X, Y). 
Theorem 9. If X,, —> X, then E|X,|" > E[X{’. 
Proof. Let0 <r <1. Then 


E|Xnl’ = E|Xn -X+X/ 
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so that 
E|Xn\’ — E|X < E|Xn—X|'. 
Interchanging X, and X, we get 
E|X|" — E|Xn\’ < E|Xn—X’. 
It follows that 
|JE|X\" — E\Xn\"| < E|X,—-X|/' +0 as n> 00. 
For r > 1, we use Minkowski’s inequality and obtain 
(EIXal" 1 < LEIXn — XVI" + (ELXI YY” 
and 
LEIXUV!” < [EX — XVI" + (EIXnl 1”. 
It follows that 
[EY "|X a — EM |X\| < EV|X,—X +0 as n> 00. 
This completes the proof. 
Theorem 10. Letr > s. Then X,, 3x= Xn are a 
Proof. From Theorem 3.4.3 it follows that for s <r, 
E|X, —X{§ <[E|Xn—X/'!" 30 = asn > 00 
since X, nae 


Remark 4. Clearly, the converse to Theorem 10 cannot hold, since E|X|* < co 
for s < r does not imply that E|X|" < oo. 


Remark 5. In view of Theorem 9, it follows that X, > X => E|X,,|5 > E|X| 
fors <r. 


Definition 4.1 Let {X,} be a sequence of RVs. We say that X,, converges almost 
surely (a.s.) to an RV X if and only if 


(4) P{@: X,(w) > X(w)asn > oo} = 1, 


and we write X, =3 XorX n — X with probability 1. 


tMay be omitted on the first reading. 


266 LIMIT THEOREMS 
The following result elucidates Definition 4. 


Theorem 11. X, —“> X if and only if limp 00 P{supy +n |Xm — X| > e} = 0 
for alle > 0. 


Proof. Since Xp = x. Xn—X +S 0, and it will be sufficient to show the 
equivalence of 


(a) X, > Oand 
(b) limp oo P{SUPm>n |Xm| > €} = 0. 


Let us suppose that (a) holds. Let e > 0, and write 
An(e) = | sup Xml > ef and c={ lim X, =O}. 
m>n n->0O0 


Also write B,(€) = CM A,(e), and note that B,+1(€) C Bn(e), and the limit set 
MW. Bn(e) = @. It follows that 


ioe] 
jim, PBn(e) = P Amc] =0. 


Since PC = 1, PC° = 0, and we have 
PBn(e) = P(An NC) = 1— P(CSU AS) 
= 1— PC° — PA, + P(C°NA;) 
= PA, + P(C’ NAS) 
= PAn. 


It follows that (b) holds. 
Conversely, let limy-o. PAn(€) = 0, and write 


Dee) = | tim [Xn] > 6 >}. 
n—-oo 
Since D(e) C An(e) forn = 1,2,... , it follows that PD(e) = 0. Also, 
a a 1 
é3 tf: 1 
C= { jim Xn #0} < U {i Xa > i}. 
so that 
oo 1 
1-pc<Y pp(-)=0, 
=) Pp(7) 


and (a) holds. 
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Remark 6. Thus Xz =5; 0 means that for e > 0, > 0 arbitrary, we can find 
an no such that 


(5) P | sup |X,| > el <n. 


n>Nno 


Indeed, we can write, equivalently, that 


(6) slim, ?[ U {1Xnl > a] =0. 


n>Nno 


Theorem 12. X,, ey => xX, ka X. 


Proof. By Remark 6, Xy, ni 4 implies that for arbitrary e > 0,7 > 0, we can 
choose an no = no(€, 7) such that 


P| Aus—xisa]> ton 
n=No 


Clearly, 


oO 
(Xn Xi se} c (IXn—X] se} for n>no. 


n=no 


It follows that forn > no, 


Pika xis) | PYtte— xi sel] > 1mm, 


n=no 
that is, 
P(|Xn—X|>e}<n for n>n09, 
which is the same as saying that X,, ai X. 
That the converse of Theorem 12 does not hold is shown in the following example. 


Example 8. For each positive integer n there exist integers m and k (uniquely 
determined) such that 


n=24m, O<m<2, k=0,1,2,.... 


Thus, forn = 1,k = 0Oandm = 0; form =5,k = 2 andm = 1; and so on. Define 
RVs X, forn = 1,2,... on Q = [0, 1] by 
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Qk a <@m< m+ 
Xn(@) = : Qk — Qk? 
0, otherwise. 


Let the probability distribution of X, be given by P{J} = length of the interval 
ICQ. Thus 


P{Xy, = 2} = 5 and P{X, =0}=1—- x 


The limit limp-. oo Xn(w) does not exist for any w € Q, so that X, does not converge 
almost surely. But 


0 if «> 2‘, 
P{X,| > e} = P{X, > ce} = 
oem comme Pomme mee ys 
dk 
and we see that 


P{|X,| > e} > 0 as n (and hence k) — oo. 


Theorem 13. Let {X,} be a strictly decreasing sequence of positive RVs, and 
suppose that X,, 0. Then x2 6: 


The proof is left as an exercise. 


Example 9. Let {X,,} be a sequence of independent RVs defined by 


1 1 
P(X, =O}=1——, and PiXn=W=7, n=1,2,.... 


Then 


1 
E|\X, -0 = E|X,"7=-—>0 asn—> oo, 
n 


so that X,, a 0. Also, 


P{X, =0 forevery m <n < no} 


ro 1 m—1 
= 1--}= : 
II ( ‘) no 


na=m 


which diverges to zero as np > oo for all values of m. Thus X, does not converge 
to 0 with probability 1. 
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Example 10. Let {X,,} be independent, defined by 
1 1 
P{X, =0O}=1—— and P{X, =n}=—, r>2, n=1,2,.... 
nv nv 
Then 
no 1 
P{X,=0 f <n< = ann ae 
{Xn orm <n <no} II( =) 


As no — 00, the infinite product converges to some nonzero quantity, which itself 
converges to 1 as m — oo. Thus X, 0, However, E|X,,|’ = 1, and X, +» Oas 
n— oo, 


Example 11. Let {X,} be a sequence of RVs with P{X, = +1/n} = i. Then 


E|Xn\" = 1/n’ + Oasn — oo, and X, —> 0. For j < k,|X;| > [Xx], so that 
{(Xi| > e} C {[Xj| > &]. It follows that 


(JUXj1 > e} = (1Xal > 6}. 


j=n 


Choosing n > 1/é€, we see that 


n 


jen 


P | Guxs > a] = P(IX,|>e} <P {1% 2 “| =0, 


and (6) implies that X, —~> 0. 


Remark 7. In Theorem 6.4.3 we prove a result that is sometimes useful in prov- 
ing a.s. convergence of a sequence of RVs. 


Theorem 14. Let {X,,, Y,},1 = 1,2,... , be a sequence of RVs. Then 
IXn —Yn]~> 0 and Y,-5Y3X, > Y¥. 
Proof. Let x be a point of continuity of the DF of Y and ¢ > 0. Then 


P{Xn <x} = P{Y, <x+Y¥,— Xn} 
= P{¥, <x +Y¥n — Xni Yn — Xn < €} 
+ PIYn <x +¥n — Xni Yn — Xn > €} 


< P{Y, <x te}+ P{¥, — Xn > &}. 
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It follows that 


lim P{X, <x} < lim P{¥, <x +6}. 
n->0o n—->oo 


Similarly, 


lim P{X, <x}> lim P{¥, <x —é}. 
n-00 n=?00 


Since e > 0 is arbitrary and x is a continuity point of P{Y < x}, we get the result 
by letting e > 0. 


Corollary. X, > X > X, 5 X. 


Theorem 15 (Slutsky’s Theorem). Let {X,, ¥,},n = 1,2,..., be a sequence 
of pairs of RVs, and let c be a constant. Then 


@) x, ey, Sea ey, Sx 
(b) Xp, > X, 


Xn¥n->cX fe £0, 


P 
Y,7-¢> P 
XnY¥, 2 0 if c = 0; 


(c) Xn + X, Yn “> 0 => Xn/Vn %& X/eific £0. 


Proof. (a) Xn ~> X > X,+c — X +c (Theorem 3). Also, Y, —c = 
(Yn + Xn) — (Xn +c) —> 0. A simple use of Theorem 14 shows that 


Xn +Y¥q X +e. 
(b) We first consider the case where c = 0. We have for any fixed number k > 0, 


é€ é 
P(IXn¥al > 8} = P{IXa¥al > & al <=} + PLIXaYol > €,[¥al > oI 


E 
< P(IXnl > k} + P{I¥al > a 


Since Y, a Oand X,, Back X, it follows that for any fixed k > 0, 
lim P{|Xn¥n| > €} < P{|X| > k}. 
noo 


Since k is arbitrary, we can make P{|X| > k} as small as we please by choosing k 
large. It follows that 


XnYn —> 0. 
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Now, let c 4 0. Then 


XnVYn ~ Xn = Xn(Un — 0), 


and since Xy, a X,Yn 8 c, Xn(¥n —c) ne Q. Using Theorem 14, we get the result 
that 


XpV¥q > X. 


(c) ¥, > c,ande #0 => Y~! 4s c7!. It follows that X, 5 X,%, > c= 
XnY, 14 ely , and the proof of the theorem is complete. 


As an application of Theorem 15, we present the following example. Many more 
examples appear in Chapter 7. 


Example 12. Let X1, X2,... , be iid RVs with common law N(0, 1). We shall 
determine the limiting distribution of the RV 


Xi+Xo+--+Xn 
Wr = Vn 
n VOW 4 4X 


Let us write 


1 x24 x2 4...4 x2 
Un = (Ki + X2 +--+ + Xn) and Vp = LAR 


Then 


For the MGF of U;,, we have 


n n 
My, (t) = I] EetXi/v" [Te 


i=] i=] 


2 
atl, 


so that U,, is an A/(0, 1) variate (see also Corollary 2 to Theorem 5.3.22). It follows 


that U, es Z, where Z is an N/(0, 1) RV. As for V,, we note that each x? is a 
chi-square variate with 1 d.f. Thus 


n 1 1/2 x 
My, (t) = —_-——— a 
“) (ss) eo) 


i=] 
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which is the MGF of a gamma variate with parameters a = n/2 and 6 = 2/n. Thus 
the density function of V,, is given by 


1 1 


fv, (x) = {T@/2) Q/ny* 
0, otherwise. 


nf2—lo—nx/2 0<x <0©o, 


We will show that V,, Eat 1. We have for any ¢ > 0, 


2 
Pilv;- 1 > 6) s = = (5) (=) 570 as n— oo. 


We have thus shown that 
Un, > Z and Vy -> 1. 
It follows by Theorem [5(c) that W, = Un/ Vn ZS Z, where Z is an NV(0, 1) RV. 


Later we will see that the condition that the X;’s be (0, 1) is not needed. All we 
need is that E|X;|* < oo. 


PROBLEMS 6.2 


1. Let X;, X2,... be a sequence of RVs with corresponding DFs given by F,(x) = 
Oifx < —n, = (x +n)/2n if —n < x <n,and=1 ifx > n. Does F, converge 
to a DF? 


2. Let X;,X2... be iid N(O, 1) RVs. Consider the sequence of RVs {Xn}, where 
n= n!>-"_, X;. Let F, be the DF of X,,0 = 1,2,.... Find limy oo Fn(x). 
Is this limit a DF? 


3. Let X1, X2,... be iid U(O, 0) RVs. Let Xq) = min(X1, X2,..., Xn), and 
consider the sequence Y,, = nX 1). Does Y, converge in distribution to some RV 
Y? If so, find the DF of RV Y. 


4. Let X1, X2,... be iid RVs with common absolutely continuous DF F’. Let 
X (ny) = max(X}, X2,..., Xn), and consider the sequence of RVs Y, = n[1 — 
F(X (ny)]. Find the limiting DF of Y,,. 


5. Let X1, X2,... be a sequence of iid RVs with common PDF f(x) = e~** if 
x >0,and = Oifx <0. Write X, =n! S77, Xi. 


(a) Show that X, > 1 +0. 
(b) Show that min{X1, X2,... , Xn} —> 9. 


6. Let X1, Xp, ... be iid U[0, 9] RVs. Show that max{X1, X2,...,Xn} > @. 
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7. 


10. 


11. 


12 
13. 


14. 


Let {X,} be a sequence of RVs such that X, es X. Let a, be a sequence of 
P 
positive constants such that a, — oo as n — oo. Show that a, ly, > 0. 


. Let {X,} be a sequence of RVs such that P{|X,| < k} = 1 for all n and some 


constant k > 0. Suppose that Xp, 4, X. Show that X. n —> X for anyr > 0. 


. Let X1, X2,... , Xan be iid (0, 1) RVs. Define 


U, 
Vr. = X?4+-X34---4X?2, and Zn = 7 
n 


Find the limiting distribution of Z,. 


Let {X,,} be a sequence of geometric RVs with parameter 1/n,n > 2 > 0. Also, 
let Z, = X,/n. Show that Z,, s G(1, 1/A) as n — oo. (Prochaska [80]) 


Let X,, be a sequence of RVs such that X, = 5 0, and let c, be a sequence of 
real numbers such that c, — 0 asn — oo. Show that X, + cn a, 


Does convergence almost surely imply convergence of moments? 


Let X1, X2,... , be a sequence of iid RVs with common DF F, and write Xin) = 
max{X 1, X2,...,Xn},n=1,2,.... 


(a) Fora >.0, limy+o0 x* P{X; > x} = b > 0. Find the limiting distribution 
of (bn)~'/* Xm). Also, find the PDF corresponding to the limiting DF and 
compute its moments. 


(b) If F satisfies 
lim e*[1— F(x)]=b>0, 
x—>CO 


find the limiting DF of X(,) — log(bn) and compute the corresponding PDF 
and the MGF. 


(c) If X; is bounded above by xo with probability 1, and for some a > 0 


lim (x9 — x) “(1 — F(@x)] =5b > 0, 
x->Xp— 


find the limiting distribution of (bn)!/ {Xx (n) — Xo}, the corresponding PDF, 
and the moments of the limiting distribution. 


(The remarkable result above, due to Gnedenko [33], exhausts all limiting dis- 
tributions of Xn) with suitable norming and centering.) 


Let {F,,} be a sequence of DFs that converges weakly to a DF F that is continu- 
ous everywhere. Show that F,,(x) converges to F(x) uniformly. 
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15. Prove Theorem 1. 

16. Prove Theorem 6. 

17. Prove Theorem 13. 

18. Prove Corollary 1 to Theorem 8. 


19. Let V be the class of all random variables defined on a probability space with 
finite expectations, and for X € V define 


p(X) = e|{ lA , 
1+ |X| 
Show the following: 
(a) p(X +Y) < p(X) + p(Y); p(oX) < max(lo}, 1)p(X). 
(b) d(X, Y) = p(X — Y) is a distance function on V (assuming that we identify 
RVs that are a.s. equal). 
(c) limy+oo d(Xn, X) =0  X, > X. 
20. For the following sequences of RVs {X,}, investigate convergence in probability 
and convergence in rth mean. 
(a) X_, ~ C(1/n, 0). 
(b) P(X, =e") =1/n?, P(X, =0) = 1—1/n?. 


6.3 WEAK LAW OF LARGE NUMBERS 


Let (X,} be a sequence of RVs. Write S, = )-(-, Xk.n = 1,2,.... In this section 
we answer the following question in the affirmative: Do there exist sequences of 
constants A, and B, > 0, B, — 00 aS n — on, such that the sequence of RVs 
B~!(S, — An) converges in probability to 0 as n —> 007 


Definition 1. Let {X,} be a sequence of RVs, and let S, = )op)Xi.n = 
1,2,.... We say that {X,,} obeys the weak law of large numbers (WLLN) with 
respect to the sequence of constants {B,}, Bn > 0, Bn t 00, if there exists a se- 


P 
quence of real constants A, such that By 1(§, — An) — Oasn — oo. Ay are called 
centering constants, and B,, norming constants. 


Theorem 1. Let {X,,} be a sequence of pawass uncorrelated RVs with EX; = 
py and var(X;) = o?,i = 1,2,.. fF eae —> 00 aS n — OO, we can choose 


An = Yfa1 He and By = S7}_, 0, thatis, 


n 

Xi--wi P 
ya 40 asn —> oo. 
i=l 2i=1% 
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Proof. We have, by Chebychev’s inequality, 


n n ea = 2 
P| Som > 2 Svea «Dol 
k=1 i=l 


2 
e (Diet a?) 
Corollary 1. If the X,’s are identically distributed and pairwise uncorrelated 


1 
=; - 0 asn — oO. 
2 
7 iat GF} 
with EX; = pw and var(X;) = a2 < 00, we can choose A, = nu and B, = no?. 
25" 22 


Corollary 2. In Theorem | we can choose B, = n, provided that n j-1 0; > 
Oasn — ov. 


Corollary 3. In Corollary 1 we can take A, = ny and B, = n, since no2/n2 > 
0 asn — oo. Thus, if {X,} are pairwise-uncorrelated identically distributed RVs 


: 3 : P 
with finite variance, S,/n —> w. 


Example I. Let X1, X2,... be iid RVs with common law b(1, p). Then EX; = 
p, var(X;) = p(i — p), and we have 


——>p as n —> OO. 
Note that S,,/n is the proportion of successes in n trials. 


Hereafter, we shall be interested mainly in the case where B, = n. When we say 
that {X,,} obeys the WLLN, this is so with respect to the sequence {n)}. 


Theorem 2. Let {X,,} be any sequence of RVs. Write ¥, = n7! dpa Xe. A 
necessary and sufficient condition for the sequence {X,,} to satisfy the weak law of 
large numbers is that 


y2 
(1) E 1+ ¥2 >0 asn—> ov. 


Proof. For any two positive numbers a, b, a > b > 0, we have 


a 1+b 


o l+a b 


Let A = {|¥n| > €}. Then w € A => [Y,|? > e2 > 0. Using (2), we see thatw € A 
implies that 
Y) tae 

>1 


n 


14+¥2 ¢& 
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It follows that 


2 2 
pa < pl—#_>_— 
14+¥2 ~ T+e2 


\¥2/. + ¥2)| 
~  e2/(1 +6?) 
0 asn — OOo. 


by Markov’s inequality 


That is, 
P 
Y, > 0 asn —> 00. 


Conversely, we will show that for every ¢ > 0, 
y? 

nr 
(3) Piel = e)= Ey ape 


We will prove (3) for the case in which Y,, is of the continuous type. The discrete 
case being similar, we ask the reader to complete the proof. If Y, has PDF f,(y), 


then 
[2 T+y meal ey ee [+ iat a Baroy y 
lyl>e — tylse 
< Pita > el+ fi (1- ee 5) fxoydy 
< P{lY, < P{l¥n| > 8} +67, 
which is (3). 


Remark 1. Since condition (1) applies not to the individual variables but to their 
sum, Theorem 2 is of limited use. We note, however, that all weak laws of large num- 
bers obtained as corollaries to Theorem 1 follow easily from Theorem 2 (Problem 6). 


Example 2. Let (X;, X2,... , Xn) be jointly normal with EX; = 0, EX? = | 
for all i, and cov(X;, X;) = pif |j—i| = 1, and = 0 otherwise. Then S, = en Xk 
is N(0, 0”), where 


o? = var(S,) =n +2(n— lp, 
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2 co x2 
7% oJ/2n [ n2 + x2 
y*[n + 2(n — 1p] 


32. z 
e* [20 dx 


2 foe) 
S val 2 + y2[n + 2in— lp]. 


— nt2n—De 
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lo e) 
26-712 gq 
—y*e y>0 asin —> oo. 
i V2 


It follows from Theorem 2 that n~'S, -. 0. We invite the reader to compare this 


result to that of Problem 6.5.6. 


Example 3. Let X1, X2,... be tid C(1, 0) RVs. We have seen (corollary to The- 
orem 5.3.18) that n7! Sn ~ C(I, 0), so that n—'S, does not converge in probability 


to 0. It follows that the WLLN does not hold (see also Problem 10). 


Let X;, X2,... be an arbitrary sequence of RVs, and let S, = paar Xy,n = 


1,2,.... Let us truncate each X; at c > 0, that is, let 


xXf= 


Ll 


Xj if |X;| <c 
0 if |X;j,>c’ 


Write 


n n 
Say x and =m, => Exe. 
i=l i 


i=l 


Lemma 1. For any ¢ > 0, 


(4) P(|Sn ~ mn| > €} < P{|Sy— mal > e}+ D> P{[Xel > ch- 


Proof. We have 


P{|Sn — mn| > €} = P{|Sn — mn| > € and |Xx| < 


+ P{\S, —my,| > € and |X;4| > 


< P{|S; — mn| > €} + P{|Xkl > ¢ 


< PSG — mal > e} + D> PUXe| > ch. 


k=1 


for at least one k, 


for at least one k, 
k=1,2,...,n} 


278 LIMIT THEOREMS 
Corollary. If X;, X2,... , Xn are exchangeable, then 

(5) P{|Sp — mpl > €} < P{|S, — mal > e} +nP{|X1| > ch. 

If, in addition, the RVs X;, X2,... , X, are independent, then 


nE(X¢)? 
(6) P{ISq —my| > €} < ——5*— + PUN > }. 
Inequality (6) yields the following important theorem. 


Theorem 3. Let {X,} be a sequence of iid RVs with common finite mean up = 
EX,. Then 


Ro Gye as n —> OOo. 
Proof. Let us take c = n in (6) and replace ¢ by ne; then we have 
P{ISp — mal > ne) < —5 EX}? + nP(IXi| > n), 
where X7 is X, truncated at n. 


First note that E|X | < co > nP{|X,| > n} — Oasn — oo. Now (see remarks 
following Lemma 3.2.1) 


E(x") = 2| xP{|Xy| > x} dx 
0 


A n 
=2(f +f ) «POX > adr, 
0 A 


where A is chosen sufficiently large that 
5 ‘ 
xPUXi| > x} <5 for all x > A,5 > 0 arbitrary. 
Thus 


n 
E(x"? <e+s f dx <c+n6, 
A 
where c is a constant. It follows that 
1 m2 c 5 
heb Sa 


ne e2’ 


and since 4 is arbitrary, (1/ne*) E(X yr can be made arbitrarily small for sufficiently 
large n. The proof is now completed by the simple observation that since EX; = pu, 
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We emphasize that in Theorem 3 we require only that E|X1| < 00; nothing is 
said about the variance. Theorem 3 is due to Khintchine. 


Example 4. Let X1, X2,... be iid RVs with E|X,|* < oo for some positive 
integer k. Then 


k 
n 
xj 


P 
ye —_ Exk asn — oo. 
j=" 


Thus, if EX? < oo, then 71 X2/n > EX?; and since ("_, Xj/n)?  (EX1)?, 
it follows that 


Sx? Ex,\2 
Je (=) , var(X}). 
n n 


Example 5. Let X1, X2,... be iid RVs with common PDF 


14+68 of 
fay=i eee *=*) 530. 
; x<il 


Then 


oo 4 
eixi= +8) f a ax 


1+6 
=>-—— < Ow, 


6 


and the law of large numbers holds, that is, 


Pp 1+6 


n7'S, > ar asn — OOo. 
PROBLEMS 6.3 
1. Let X1, X2,... be a sequence of iid RVs with common uniform distribution on 
[0, 1]. Also, let Zn = ([]}_, Xi)!/” be the geometric mean of X1, X2,... , Xn; 
n= 1,2,.... Show that Z, SS c, where c is a constant. Find c. 


2. Let X;, X2,... be iid RVs with finite second moment. Let 
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1 n 
Yq = ———— )“iXi. 
‘ ray 


Show that Y,, Ea EX. 


3. Let X1, X2,... be a sequence of iid RVs with EX; = yu and var(X;) = o?. 
Let 5, = 5 Xj. Does the sequence S; obey the WLLN in the sense of 
Definition 1? If so, find the centering and the norming constants. 


4. Let {X,} be a sequence of RVs for which var(X,) < C for all n and pj; = 
cov(X;, X;) — Oas |i — j| > oo. Show that the WLLN holds. 
5. For the following sequences of independent RVs, does the WLLN hold? 
(a) P(X = £2") = 5. 
(b) P{X_ = +k} = 1/2Vk, P{X, = 0} = 1 — U/Vk). 
(c) P{X,_ = £2") = 1/2744), P(x, = 0} = 1-1/2”). 
(d) P{X, = £1/k} = 5. 
(e) P{X, = Vk} = 5. 
6. Let Xi, X2,... be a sequence of independent RVs such that var (Xx) < 0o for 


= 1,2,..., and (1/n?) )-7_, var(X4) > 0 asn — oo. Prove the WLLN, 
using Theorem 2. 


7. Let X, be a sequence of RVs with common finite variance oa”. Suppose that the 
correlation coefficient between X; and X; is < O for alli # j. Show that the 
WLLN holds for the sequence {X,,}. 


8. Let {X,,} be a sequence of RVs such that X; is independent of X; for j #k+1 
or j # k — 1. If var(X;) < C for all k, where C is a constant, the WLLN holds 
for {Xx}. 


9. For any sequence of RVs {X,}, show that 


P eS P 
max |X;|—- O>1n 1S. — 0. 
1<k<n 


10. Let X1, X2,... be iid C(1, 0) RVs. Use Theorem 2 to show that the weak law of 
large numbers does not hold. That is, show that 


S2 


nm 
E——*—. +0 asn— oo, where S, = X,, n=l,2,.... 
n? + $2 7 2D 


k=1 
11. Let {X,} be a sequence of iid RVs with P{X, > 0} = 1. Let S, = Vint Xj, 


n= 1,2,.... Suppose that {a,} is a sequence of constants such that a,- Ss mee 
1. Show that (a) a, — 00 asn —> oo, and (b) Gn41/dn > 1. 
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6.4 STRONG LAW OF LARGE NUMBERS? 


In this section we obtain a stronger form of the law of large numbers discussed in 
Section 6.3. Let X;, X2,... be a sequence of RVs defined on a probability space 
(Q, S, P). 


Definition 1. We say that the sequence {X,,} obeys the strong law of large num- 
bers (SLLN) with respect to the norming constants {B,,} if there exists a sequence of 
(centering) constants {A,,} such that 


(1) By'(Sn — An) > 0 — asn > 00. 
Here B,, > 0 and B, — co asn — oo. 


We will obtain sufficient conditions for a sequence {X,} to obey the SLLN. In 
what follows we will be interested mainly in the case B, = n. Indeed, when we 
speak of the SLLN we will assume that we are speaking of the norming constants 
B, =n, unless specified otherwise. 

We start with the Borel—Cantelli lemma. Let {Aj} be any sequence of events in S. 
We recail that 


(2) im, An = lim » Uae = = a Um 


n=lk=n 


We will write A = limy;_sooAn. Note that A is the event that infinitely many of the 
An occur. We will sometimes write 


PA = P(lim A,) = P(An i.0.), 
n-> OO 


where “‘i.o.” stands for “infinitely often.” In view of Theorem 6.2.11 and Re- 
mark 6.2.6 we have X, > O if and only if P{|X,| > ¢ i.o.} = 0 for alle > 0. 


Theorem 1 (Borel—Cantelli Lemma) 


(a) Let {A,} be a sequence of events such that ee 1 PAn < 00. Then PA =0. 


(b) If {An} is an independent sequence of events such that pyres 1 PAn = ov, 
then PA = 1. 


Proof. 

(a) PA = P(limpy.00 1 Fale Ax) = limn—oo P(Uken Ax) < limn+co hen 
PA; =0. 

(b) We have AS = UP? R2.,, Af, so that 


This section may be omitted on first reading 
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00 oO 
Co : CL as Cc 
as (im, a ai) ee (A a) 3 


For ng > n, we see that (ye, Ag C 2, Ag, so that 


=n 
0° no no 
c : c}_ 4; = 
(Fas) = im» (Fas) = im, Fa ~ rao 


because {A,} is an independent sequence of events. Now we use the elementary 
inequality 


no no no 
1-1 (-$h01) <1- Flas Bay no>n, 1l>aj;>0, 
j=n j=n j=n 
to conclude that 
fe) ng 
c s _ 
P (A 4s) < lim, exp ( d pa) ; 


Since the series °° , PAn diverges, it follows that PAC = 0 or PA = 1. 


Corollary. Let {A,,} be a sequence of independent events. Then PA is either 0 
or 1. 


The corollary foliows since }*7°_, PA, either converges or diverges. 


As a simple application of the Borel—Cantelli lemma, we obtain a version of the 
SLLN. 


Theorem 2. If X;, X2,... are iid RVs with common mean w and finite fourth 
moment, then 


Proof. We have 


E{D(X; — w)}* =nE(X — pw) + 6(5)o* < Cn’. 


By Markov’s inequality, 


P| ye ar) 
1 


E(x — wy] cn? — Cc’ 
. n| = ey (ney 
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Therefore, 
oo 
» P{|S, — un| > ne} < ov, 
n=1 


and it follows by the Borel-—Cantelli lemma that with probability 1 only finitely many 
of the events {w: |(S,/n) — | > ©} occur, that is, PA, = 0, where 


> el. 


The sets A, increase, as € — 0, to the w set on which S,/n ~ yw. Letting ¢ > 0 
through a countable set of values, we have 


{1-0} =P (arn) =0 


Corollary. If X;, X2,... are iid RVs such that P{|X,| < K} = 1 for all n, 
where K is a positive constant, then n=! S, = L. 


Sn 
— ae 
n 


A; = dim sup | 


Theorem 3. Let X;, X2,... be a sequence of independent RVs. Then 
fo ¢) 
Xn —> 04 )>PUIXnl>e}<00 — foralle > 0. 
n=l 


Proof. Writing A, = {|Xn| > &}, we see that {A,,} is a sequence of independent 


events. Since X, ——> 0, Xn — Oonaset E* with PE = 0. A point w € E° belongs 
only to a finite number of A,,. It follows that 


lim sup A, C E, 
N—0O0oO 
hence P(A, i.0.) = 0. By the Borel—Cantelli lemma [Theorem !(b)] we must have 


york, PAn < 00. [Otherwise, °°°., PAn = 00, and then P(Ay, io.) = 1.] 
In the other direction, let 


I 
Aj/x = limsup {iXal > i} ; 
n->oo k 
and use the argument in the proof of Theorem 2. 
Example I. We take an application of the Borel—Cantelli lemma to prove a.s. 


convergence. 
Let {X,,} have PMF 


1 1 
P(X, = 0) =1-—, and P(X, = tn) = —. 
n 


2n% 
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Then P(|X,| > €) = 1/n®% and it foliows that 
fo. @) oo 1 
>> PUXnl > €) = > = <00 fora > 1. 
n 
n=] n=1 


Thus from Borel—Cantelli Lemma P(A, i.o.) = 0, where A, = {|X,| > &}. Now 
using the argument in the proof of Theorem 2, we can show that P(X, 4 0} = 


We next prove some important lemmas that we will need subsequently. 
Lemma 1 (Kolmogorov’s Inequality). Let X1, X2,...,X, be independent 


RVs with common mean 0 and variances op, k=1,2,...,n, respectively. Then for 
any ¢ > 0, 


(3) P| max isii> ef = YS. 


Proof. Let Ag = , 


Ay = max ISjl <7, k=1,2,...,n 
1<j<k 
and 
By == Ag-1 0 Ay 
= {|Si| <6... , [Sp_-1| < e} M {at least one of [Sj|,... , [Se] is > &} 
= {|Sy| <e,... , [Ski] < &, [Sel > e}. 
It follows that 
n 
AS => & 
k=1 
and 


By C {|Sk—-11 S &, [Sk] > e}. 
As usual, let us write /,, for the indicator function of the event B;. Then 


E(Sn1p,)° = El(Sn — Sk) Ip, + Seta, ), 
= E{(Sn — Sy)* Ip, + S21p, + 25%(Sn — Sk) Tp, }- 


Since S, — Sk = Xk41 +---+ Xn, and S;Ip, are independent, and EX; = 0 for all 
k, it follows that 
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E(SnIp,)° = E{(Sn — Sx) IB, }? + E(SktB,)” 
> E(SxIp,)* > €? P Be. 


The last inequality follows from the fact that in By, |S;| > ¢. Moreover, 


n n 
E(SnIp,)” = E(Splac) < E(S;) = )_ of, 
k=) 1 


so that 
n n 
doe =e? D> PB = & (AS), 
1 1 


as asserted. 


Corollary. Take n = 1; then 


2 
lex 

P{IXi|>e}< +, 
é 


which is Chebychev’s inequality. 


Lemma 2 (Kronecker Lemma). If bea Xp converges to s (finite) and b, ¢ co, 
then 


n 
by) D> bexr > 0. 
k=1 


Proof. Writing bo = 0, ax = by — by_1, and 5n41 = ie X~, we have 
1< Leet 
me Do bexe = Yo be sk — Sk) 
"k=l " k=] 
1 ud ig 
= — [ baSpa1 + be \5k. | — — bes, 
ram aaa p> 7 > 


] n 
= Srphes ie — by-1)5k 
m h=l 


1 n 
= Sn41 — J Ye anse. 
bn 
k=1 


It therefore suffices to show that b> 1 ee! aps, —> S. Since s, — s, there exists an 
no = no(e) such that 
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E 
\Sp — S| < 3 forn > no. 


Since b, ¢ 00, let n; be an integer > no such that 


no 
b! dite — by-1)(sk — 5)| < ; for n > ny. 


Writing 
n 
tn = by! D (be — bet), 
k=1 
we see that 


n 


Sih — bg—-1)(Sk — S) 


k=] 


1 
Irn — s| = — 
n 


> 


and choosing n > n,, we have 


J 
Irn — S| < ea <6é. 
n 


Ye ~ be-1)5 


k=not+1 


1 
FD bk ~ e-1) Se ~ 5) 


n k=1 


This completes the proof. 


Theorem 4. If )°°, var(X,) < 00, then }°° ,(X, — EX») converges almost 
surely. 


Proof. Without loss of generality, assume that EX, = 0. By Kolmogorov’s 
inequality, 


1 n 
P { max Sm+k — Sm| = e| ER 2, ver(Xms. 


Letting n — oo, we have 


P {maxim — Sml = e| =P | max |Sx — Sm| > e| 
k>] k>m+1 


It follows that 
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lim | P {max se — Sm| < e| =— |, 


and since € > 0 is arbitrary, we have 


Consequently, fa X ; converges a.s. 


As a corollary we get a version of the SLLN for nonidentically distributed RVs 
which subsumes Theorem 2. 


Corollary 1. Let {X,,} be independent RVs. If 


fo.@) 
X 
geek t) <0, By, t 0, 


then 


The corollary follows from Theorem 4 and the Kronecker lemma. 


Corollary 2. Every sequence {X,,} of independent RVs with uniformly bounded 
variances obeys the SLLN. 


If var(X;) < A for all k, and By, = k, then 


“bE 


oa 


iMe 
ni 


and it follows that 


Sn — ESn 25, 9g. 
n 


Corollary 3 (Borel’s Strong Law of Large Numbers). For a sequence of 
Bernoulli trials with (constant) probability p of success, the SLLN holds (with 
=n and A, = np). 


Since 
EX,=p, —-var(Xi) = p(I~p) <j, O<p<i, 


the result follows from Corollary 2. 
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Corollary 4. Let {X,,} be iid RVs with common mean yp and finite variance a2. 


Then 


Remark 1. Kolmogorov’s SLLN is much stronger than Corollaries 1 and 4 of 


Theorem 4. It states that if {X,,} is a sequence of iid RVs, then 


n'S, <> pp <=> E|Xi| < 0, 


and then p = EX. The proof requires more work and will not be given here. We 
refer the reader to Billingsley [5], Chung [14], Feller [23], or Laha and Rohatgi [56]. 


PROBLEMS 6.4 


1 


For the following are of independent RVs does the SLLN hold? 
(a) P{X, = £2"} = } 

(b) P(X, = +k} = 1/ovk, P(X, =0} = 1- (1/Vk). 

(c) P{X_ = £2*} = 1/27*+!, P(x, = 0} = 1 — (1/2”*). 


. Let X1, X2,... be a sequence of independent RVs with > 4 var(X4)/k* < 0O. 


Show that 


1 n 
=D var(Xx) > 0 asn — oOo. 


Does the converse also hold? 


. For what values of a does the SLLN hold for the sequence 


P{X, = $k} = 42 


Let {07} be a sequence of real numbers such that a1 % 21k? = ov. ee 
that there exists a sequence of independent RVs {X;} with var(X;) = of k= 
1,2,..., such that 1~! Lie! (X; — EX,) does not converge to 0 almost surely. 
{Hint: Let P{X; = +k} = o; 2 12k*, P{X, = 0} =1—- (a? /k*) if o,/k < 1, and 
P{X;, = tox} = 5 if o,/k > 1. Apply the Borel—Cantelli lemma to {|X,| > n}.] 


. Let X, be a sequence of iid RVs with E|X,,| = +00. Show that for every positive 


number A, P{|X,| > nAi.o.} = t and P{|S,| <nAio.} =1. 


6. Construct an example to show that the converse of Theorem 1(a) does not hold. 


. Investigate a.s. convergence of {X,,} to 0 in each case. (X,’s are independent in 


each case.) 
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(a) P(X, =e") =1/n2, P(X, =0) =1—1/n?. 
(b) P(X, =0) =1—1/n, P(X, = +1) = 1/(2n). 


6.5 LIMITING MOMENT GENERATING FUNCTIONS 


Let X;, X2,... be a sequence of RVs. Let F,, be the DF of X,;,,n = 1,2,..., and 
suppose that the MGF M,,(t) of F, exists. What happens to M,(t) asn —> 007 If it 
converges, does it always converge to an MGF? 


Example 1. Let {X,} be a sequence of RVs with PMF P{X, = ~—n} = |,n = 
1,2,.... We have 


M,,(t) = Ee'*n =e" +0 asn—> oo forallr > 0, 


and 
M,(t) > +00 forallt <0, and M,(t)—-~ 1 att=0. 
Thus 
0, t>0 
M,(t) ~ M(t) = $1, t=0 asn —> oo. 
oO, t<0 


But M(t) is not an MGE. Note that if F,, is the DF of X,, then 


0 ifx < —n 
F(x) = tee Sues, —> F@)=1 for all x, 


and F is not a DF. 


Next suppose that X, has MGF M,, and Xp. be X, where X is an RV with MGF 
M. Does M,(t) ~ M(t) as n — 00? The answer to this question is in the negative. 


Example 2 (Curtiss [18]}). Consider the DF 


0, x <—~—n, 
F(x) = 5 + cn tan~!(nx), —n<x <n, 
1, x>Nn, 


where c, = 1/{2 tan—!(n2)]. Clearly, asm — oo, 


0, x <0, 


Fn(x) > F(x) = fe 
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at all points of continuity of the DF F. The MGF associated with F,, is 


n 
M,(t) = / Cnet? — 


ag: 
as 14ntx2 “~ 


which exists for all t. The MGF corresponding to F is M(t) = 1 for all t. But 
M,(t) + M(t), since M,(t) > oo if t 4 0. Indeed, 


n It}>x3 n 
M,(t)> | ¢rn—— > x. 
a) i "6 ttn 


The following result is a weaker version of the continuity theorem due to Lévy 
and Cramér. We refer the reader to Lukacs [68, p. 47], or Curtiss [18], for details of 
the proof. 


Theorem 1 (Continuity Theorem). Let {F,,} be a sequence of DFs with corre- 
sponding MGFs {M,,}, and suppose that M,,(t) exists for |t| < to for every n. If there 
exists a DF F with corresponding MGF M which exists for |t| < t, < f, such that 


M,(t) > M(t) asn > oo for every t € [—t, ty], then Fy, —> F. 


Example 3. Let X,, be an RV with PMF 


1 1 
P(X, = 1}=—, P{X, =O} =1--. 
n n 
Then M,,(t) = (1/n)e’ +[1 —(1/n)] exists for allt € R, and M,(t) > lasn > 00 


for allt. Here M(t) = 1 is the MGF of an RV X degenerate at 0. Thus X, ze X. 


Remark I. The following notation on orders of magnitude is quite useful. We 
write x, = o(rn) if given e > 0, there exists an N such that |x,/r,| < e€ for all 
n > N, and x, = O(r,) if there exists an N and a constant c > 0, such that 
|xn/tn| < c for alln > N. We write x, = O(1) to express the fact that x, is 
bounded for large n, and x, = o(1) to mean that x, > Oasn —> ov. 

This notation is extended to RVs in an obvious manner. Thus X;, = 0p(r,) if, for 
every € > O and 6 > 0, there exists an N such that P(|X;,/rn| < 6) > 1 —e for 
n > N, and X, = Op(tn) if for e > 0, there exists ac > 0 and an N such that 


P(\Xn/tn| <c) > 1—e. We write X, = op(1) to mean Xp, —> 0. 
The following lemma is quite useful in applications of Theorem 1. 


Lemma 1. Let us write f(x) = o(x), if f(x)/x > Oas x > 0. We have 


1 n 
lim [ + = +o (<)| =e" for every real a. 
n—-0o n n 
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Proof. By Taylor’s expansion we have 


f(x) = fO)+xf' (x) 
= f(0)+xf'(O) + {f'(@x) — f’(O)}x, 0<@<1. 


If f’(x) is continuous at x = 0, then as x > 0, 
f(x) = FO) +xf'O) + ofa). 


Taking f(x) = log(1 + x), we have f’(x) = (1+ x)7!, which is continuous at 
x = 0, so that 


log(1 +x) =x + 0(x). 


Then for sufficiently large n, 


It follows that 


as asserted. 


Example 4. Let X1, X2, ... be iid b(1, p) RVs. Also, let S, = Dae Xx, and let 
M,(t) be the MGF of S,,. Then 


M,,(t) = (q + pe')"” for all t, 


where g = 1 — p. If we let n — 00 in such a way that np remains constant at A, say, 
then, by Lemma 1, 


A A t ‘ r t " t 
M,(t) = [1- - +-e =!11+-(e —1)| — expfA(te — Dj for all t, 
n n 


which is the MGF of a P(A) RV. Thus, the binomial distribution function approaches 
the Poisson DF, provided that n —> co in such a way that np = d > 0. 


Example 5. Let X ~ P(A). The MGF of X is given by 


M(t) = exp[A(e’ — 1)] for all f. 
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Let Y = (X —A)/V4. Then the MGF of Y is given by 


t 
My(t) =e”? M (=) 
Jr 
Also, 
t 
log My (t) = —tVA + log M (=) 
. oN 
= —tVa+a(et/V* — 1) 
t t* P 
=-t Xr x — — en ar ary 
vat ( 2 ana © 
P 
m3 aigan-e 
It follows that 
2 
log My(t) > z as 4 — oo, 


so that My(t) > e” /2 as A —> 00, which is the MGF of an N’(0, 1) RV. 


For more examples, see Section 6.6. 


Remark 2. As pointed out earlier, working with MGFs has the disadvantage that 
the existence of MGFs is a very strong condition. Working with CFs which always 
exist, on the other hand, permits a much wider application of the continuity theorem. 
Let ¢, be the CF of F,,. Then F,, , F if and only if ¢, > ¢ asn — coon R, 


where ¢ is continuous at t = 0. In this case ¢, the limit function, is the CF of the 
limit DF F. 


Example 6. Let X be a C(0, 1) RV. Then its CF is given by 


1 of? COstx sin tx 
Eexp(itx _ 
SPMin) = =f. fae" tin fas 
— =f at dx = ew Al 
oo 1 +x? 


since the second integral on the right side vanishes. 
Let {X,} be iid RVs with common law £(X), and set ¥, = jel X ;/n. Then 
the CF of Y,, is given by 
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@n(t) = Eexp (« 35%) = [ lexp (-") 


j=l 


= exp(—It}) 


for all n. It follows g, is the CF of a C(1, 0) RV. We could not have derived this result 
using MGFs. Also, if U, = a= X ;/n® for a > 1, then 


gu, (t) = exp (- =) >t 


n 


as n — oo for all t. Since p(t) = 1 is continuous at t = 0, g is the CF of the limit 


DF F. Clearly, F is the DF of an RV degenerate at 0. Thus ae Xj /n* Su ; 
where P(U = 0) = 1. 


PROBLEMS 6.5 


1. Let X ~ NB(r; p). Show that 
L 
2px > Y as p > 0, 
where Y ~ x2(2r). 
2. Let X, ~ NB(tna; 1 — pn), n = 1,2,.... Show that X, Ax ar, —> 0, 
Pn — 0, in such a way that r, pp — A, where X ~ P(A). 
3. Let X;, X2,... be independent RVs with PMF given by P{X, = +1} = i, 
n=1,2,....Let Zy = "_, Xj/2/. Show that Z, > Z, where Z ~ U[-1, 1]. 


4, Let {X,} be a sequence of RVs with X, ~ G(n, 8) where B > 0 is a constant 
(independent of n). Find the limiting distribution of X,,/n. 


5. Let X, ~ x2(n),n = 1,2,.... Find the limiting distribution of X,,/n?. 
6. Let X1, X2,... , Xn be jointly normal with EX; = 0, Ex? = 1 for all i and 


cov(X;,X;) = p, i, j = 1,2,... (i # j). What is the limiting distribution of 
n-'§,, where Sy, = )-y_1 Xe? 


6.6 CENTRAL LIMIT THEOREM 


Let X1, X2,... be a sequence of RVs, and let S, = )¢_), Xk,n = 1,2,.... 
In Sections 6.3 and 6.4 we investigated the convergence of the sequence of RVs 
By 1(S, — An) to the degenerate RV. In this section we examine the convergence of 
By 1(S,,—An) toa nondegenerate RV. Suppose that for a suitable choice of constants 
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A, and B, > 0, the RVs By lis, — An) a Y. What are the properties of this 
limit RV Y? The question as posed is far too general and is not of much interest 
unless the RVs X; are suitably restricted. For example, if we take X, with DF F and 
X2, X3,... to be 0 with probability 1, choosing A, = 0 and B, = 1 leads to F as 
the limit DF. 

We recall (Example 6.5.6) that if X), X2,... , X, are iid RVs with common law 
C(1, 0), then n7~!5, is also C(I, 0). Again, if X1, X2,..., Xn are iid (0, 1) RVs 
then n~!/ 25, is also V(0, 1) (Corollary 2 to Theorem 5.3.22). We note thus that for 
certain sequences of RVs there exist sequences A, and B, > 0, By — oo, such that 


By (Sp — An) & Y. In the Cauchy case B, = n, A, = 0, and in the normal case 
B, = n'/2, A, = 0. Moreover, we see that Cauchy and normal distributions appear 
as limiting distributions—in these two cases, because of the reproductive nature of 
the distributions. Cauchy and normal distributions are examples of stable distribu- 
tions. 


Definition 1. Let X,, X2 be iid nondegenerate RVs with common DF F. Let aj, 
a» be any positive constants. We say that F is stable if there exist constants A and B 
(depending on aj, a2) such that the RV B~!(ayX, + a2X2 — A) also has the DF F. 


Let X1, X2,... be iid RVs with common DF F. We remark without proof (see 
Loéve [64, p. 339]) that only stable distributions occur as limits. To make this state- 
ment more precise, we make the following definition. 


Definition 2. Let X 1, X2,... be iid RVs with common DF F. We say that F be- 
longs to the domain of attraction of a distribution V if there exist norming constants 
B, > 0 and centering constants A, such that as n — oo, 


(1) P{By" (Sy — An) < x} > V(x) 
at all continuity points x of V. 


In view of the statement after Definition 1, we see that only stable distributions 
possess domains of attraction. From Definition 1 we also note that each stable law 
belongs to its own domain of attraction. The study of stable distributions is beyond 
the scope of this book. We restrict ourselves to seeking conditions under which the 
limit law V is the normal distribution. The importance of the normal distribution in 
Statistics is due largely to the fact that a wide class of distributions F belongs to the 
domain of attraction of the normal law. Let us consider some examples. 


Example 1. Let X1, X2,... , Xn be iid b(1, p) RVs. Let 


n 
Sn = )>Xt, and An =ES,=np, Bn = Vvar(Sn) = V/np( — p). 
k=} 
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Then 
Sn — np | 
M,(t) = E ea oe AE 
ee aes =D) 


Pp 
=[[eon[ =| 
= ex |= || + ex =|! =1- 
SOP Lap IIT PPL Yap pif TO 
- __ Pt qt_\|" 
~ [‘ exp ( a) as (45) 
ali 2 ‘) : 


It follows from Lemma 6.5.1 that 


2 
M,,(t) — et /? asin —> 00, 


and since e"/? is the MGF of an NV (0, 1) RV, we have by the continuity theorem 


_ x 
|= io <x| + I =| edt forallx eR. 


Example 2. Let X,, X2,... , Xn be iid x7(1) RVs. Then S, ~ x2(n), ESn =n, 
and var(S,,) = 2n. Also let Z, = (S, — n)/V2n; then 


M,,(t) = Ee’2" 


= exp (-+/3) (1 ~ 4)", ar < V2n, 
= ex (+/) - 1/2 exp (/2)] t<J5- 


Using Taylor’s approximation, we get 


2 2 2 2), 1 2) 
ox0(n[z) =r 4nf2 5 ( 2) + ges (1 oe 


Is 
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where 0 < 0, < t./2/n. It follows that 


2 —n/2 
macy = (1-54 2) , 
n n 
eg fd ae at 
t(n) = a + (5 ae exp(@,) > 0 as n > OO, 


for every fixed t. We have from Lemma 6.5.1 that M,(t) > e”/2 asin —> o0 for all 
real t, and it follows that Z, a 4 , where Z is V’(0, 1). 


where 


These examples suggest that if we take iid RVs with finite variance, and take 
An = ES,, By, = var(S,,), then By (Sp — An) 5 Z, where Z is N(O, 1). This 
is the central limit result, which we now prove. The reader should note that in both 
Examples 1 and 2, we used more than just the existence of E|X [?. Indeed, the MGF 
exists and hence moments of all order exist. The existence of MGF is not a necessary 
condition. 


Theorem 1 (Lindeberg—Lévy Central at Theorem). Let {X,} be a se- 
quence of iid RVs with 0 < var(X,) = o2 < oo and common mean ht. Let 
Sn = j=, Xj,n = 1,2,.... Then for every x € R, 


= ba aae (Eee | - wef a 


Proof. The proof we give here assumes that the MGF of X,, exists. Without loss 
of generality, we also assume that EX, = 0 and var(X,,) = 1. Let M be the MGF of 
X,. Then the MGF of S,,/./n is given by 


M,(t) = Eexp (3) = [M (<2)] 


and 
In M,(t) =n InM(t/J/n) = ae 
_ Lala) 
=e 


where L(t/./n) = In M(t/./n). Clearly, L(0) = In(1) = 0, so that as n — ©0, the 
conditions for L’Hospital’s rule are satisfied. It follows that 


Litt t 
fia het) oa 
n->00 n—->0o 2/J/n 
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and since L'(0) = EX = 0, we can use L’Hospital’s rule once again, to get 


Lit/Jnaye? — 0? 
SS 


lim In M,(t) = lim 
n-> 00 n—>0o 2 


using L” (0) = var(X) = 1. Thus 
12 
M,(t) —> exp oe la M(t) 


where M(t) is the MGF of a V(0, 1) RV. 


Remark I. Inthe proof above we could have used the Taylor series expansion of 
M to arrive at the same result. 


Remark 2. Even though we proved Theorem 1 for the case when the MGF of 
Xn’s exists, we will use the result whenever 0 < EX? = o? < oo. The use of 
CFs would have provided a complete proof of Theorem 1. Let ¢ be the CF of Xn. 
Assuming again, without loss of generality, that EX, = 0, var(X,) = 1, we can 
write 


b(t) = 1— 41? + 170(1). 
Thus the CF of S,,/./n is 


t \1" | ee ? 
—_= =|{|l-—t —o(1 
[6 (=)] att rit } 
which converges to exp(—t? /2), which is the CF of a V'(O, 1) RV. The devil is in the 
details of the proof. 


The following converse to Theorem 1 holds. 


Theorem 2. Let X;, X2,... , X;, be iid RVs such that n—!/2§,, has the same dis- 
tribution for every n = 1, 2,.... Then, if EX; = 0, var(X;) = 1, the distribution of 
X; must be V’(0, 1). 


Proof. Let F be the DF of n~!/25,. By the central limit theorem, 


lim P{n7!/?5, < x} = O(x). 
n—->oo 


Also, P{n~!/2S, < x} = F(x) for each n. It follows that we must have F(x) = 
P(x). 


Example 3. Let X1, X2,... be iid RVs with common PMF 


P{X =k} = p(i— p)*, k=0,1,2,..., O<p<1l, qg=I1-—p. 
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Then EX = q/p, var(X) = q/p’. By Theorem 1 we see that 


{= —n@/P) , 
Jng 


Example 4. Let X1, X2,... be iid RVs with common B(a, 8) distribution. Then 


<x} + ou) asn — oo forall x € R. 


a = ap 
a+p me HOO= OT Ba + B+ 1 


By the corollary to Theorem 1, it follows that 


S.—nlaf@+A)l 2 


Vapn/[(a + B+ 1)(a + B)*] 


where Z is N(0, 1). 


For nonidentically distributed RVs we state, without proof, the following result 
due to Lindeberg. 


Theorem 3. Let X;, X2,... be independent RVs with DFs F|, F2,... , respec- 
tively. Let EX; = pg and var(X,) = of, and write 


n 
=) 2 
ss, >= OF: 

j=l 


If the F,’s are absolutely continuous with PDF f;,, assume that the relation 
n 


(2) ime i Pais haya 20 
|x—pey|>ES, 


n> Oo s2 kel 


holds for all ¢ > 0. (A similar condition can be stated for the discrete case.) Then 


sD, Cr Dre 
(3) gt = DIMI Mia 4, 7 vo, 


Sn 


Condition (2) is known as the Lindeberg condition. 


Feller [21] has shown that condition (2) is necessary as well in the following 
sense. For independent RVs {X;} for which (3) holds and 


P| max |X Zz EX;| = eVvar(S,)} > 0, 
<k<n 


(2) holds for every ¢ > 0. 
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Example 5. Let X1, X2,... be independent RVs such that X; is U (at, ag). 


Then EX, = 0, var(X;,) = (1/3)a?. Suppose that |a,| < a and yy a? — oo as 
n — oo. Then 


1i< 1 
2 2 
i 2 J: Se(x)dx < 2 / a Day dx 


Sn k= 1 inioes, Akal |xl>e5n 
2Hn 
a var(X,) 
<3) PUXel > em} < ee 2,2 
S E°Sh 
n k=i 
a 


= aa 0 asn — OOo. 
e*ss 


If )-9° a? < 00, then s? + A?, say, as n — oo. For fixed k, we can find e, such 
that e,A < a, and then P{|X,| > e45,} > P{|X,x| > €,A} > 0. Forn > k, we have 


i< 2 s2e? 2 
n 
aos / x? fi(x)dx > EDT P(X | > ex5n) 
n j=l n j=l 
Ix|>€k5n 


> ef P{IXul > ex5n} 
> 0, 


so that the Lindeberg condition does not hold. Indeed, if X;, X2,... are indepen- 
dent RVs such that there exists a constant A with P{|X,| < A} = 1 for all n, the 
Lindeberg condition (2) is satisfied if s? —> co asn —> oo. To see this, suppose that 
s2 —> oo. Since the X;,’s are uniformly bounded, so are the RVs X;, — E Xx. It follows 
that for every ¢ > 0 we can find an N, such that forn > N,, P{|X, — EX,;| < 5p, 
k =1,2,...,n} = 1. The Lindeberg condition follows immediately. The converse 
also holds, for if lim,— 00 s? < oo and the Lindeberg condition holds, there exists a 
constant A < oo such that s? -> A?. For any fixed j, we can find an ¢ > 0 such that 
P{|X; — “j| > €A} > O. Then, forn > j, 


n 
2 pe [= wo? fords > PURE = al > o50) 
My LEl>eSn = 
> e? P{|X; —pwj| >A} 
> 0, 


and the Lindeberg condition does not hold. This contradiction shows that s2 —> 00 is 
also a necessary condition; that is, for a sequence of uniformly bounded independent 
RVs, a necessary and sufficient condition for the central limit theorem to hold is 


52 > coasn —> oo. 
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Example 6. Let X, X2,... be independent RVs such that o, = E|X;|2+° < 00 
for some 5 > O anda; +a2+---+a, = o(s2t), Then the Lindeberg condition 
is satisfied, and the central limit theorem holds. This result is due to Lyapunov. We 
have 


1 n 
pe 


n k=1 


1 7 O° 
/ x? fil) dx < a5 > / Ix|?*? f(x) dx 
n k=] Y—OO 


[x|>e57, 


nm 


¢ s2ts 


asin —> Oo. 
A similar argument applies in the discrete case. 

Remark 3. Both the central limit theorem (CLT) and the (weak) law of large 
numbers (WLLN) hold for a large class of sequences of RVs {X,,}. If the {X,,} are 
independent uniformly bounded RVs, that is, if P{|X,| < M} = 1, the WLLN 
(Theorem 6.3.1) holds; the CLT holds provided that s? > oo (Example 5). 


If the RVs {X,,} are iid, then the CLT is a stronger result than the WLLN in that 
the former provides an estimate of the probability P{|S, — nu{/n > &}. Indeed, 


P(|Sq — ny > ne) = P| eae = £ vii 
~1-P{iz\< va}, 
oO 


where Z is (0, 1), and the law of large number follows. On the other hand, we note 
that the WLLN does not require the existence of a second moment. 


Remark 4. If {X,,} are independent RVs, it is quite possible that the CLT may 
apply to the X,,’s, but not the WLLN. 


Example 7 (Feller [22, p. 255]). Let {X;} be independent RVs with PMF 
P{X,= Y= P{(Xe=—MY= 4, k=1,2,.... 


Then EX, = 0, var(X;) = k*. Also let 4 > 0; then 


n n+l 1)2441 
a sae dy = cae a 
*n zt <| isa | 


It follows that if 0 < A < 4, s,/n — 0, and by Corollary 2 to Theorem 6.3.1, the 
WLLN holds. Now k* < n*, so that the sum )7y_1 Djey1>es, 24 Pk! Will be nonzero 
if n* > es, © e[n*+!/2/,/QX + I]. It follows that as long asm > (24 + 1e~?, 
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1 n 
5 Y h= 


nN k=] |xy|>e5p 


and the Lindeberg condition holds. Thus the CLT holds for A > 0. This means that 


f2x+1 b e-0/2 eta 
Pe < a —= Sn < ol +/{ << 


an®ti/2-1 Sp bndti/2-1 f et /2 
PY ————_ =< — < ———— }} > dt 
J2r+1 n V2A +1 a V2R 


Thus 


and the WLLN cannot hold for A > 5. 


We conclude this section with some remarks concerning the application of the 
CLT. Let X;, X2,... be iid RVs with common mean y and variance 0”. Let us write 


_ Sn ne 


oJn ’ 


and let z,, 22 be two arbitrary real numbers with z; < 22. If F, is the DF of Z,, then 


Zn 


lim P{z} < Zp < z2} = lim [Fy(z2) — Fr(zi)] 
noo n->0OO 


1 
~ V2n 21 


2 2 
etl? dt, 


that is, 
1 22 —? 2 
(4) lim P{zyo0J/n+nu <S,< z20./n +np} = al e! /? dt. 
=? 09: 20 zy 


It follows that the RV S, = )-y_; Xx is asymptotically normally distributed (see 
Section 7.5) with mean ny and variance no. Equivalently, the RV n~'S, is asymp- 
totically V(u, o7/n). This result is of great importance in statistics. 

In Fig. 1 we show the distribution of X in sampling from P(A) and G(I, 1). 
We have also superimposed, in each case, the graph of the corresponding normal 
approximation. 

How large should n be before we apply approximation (4)? Unfortunately, the an- 
swer is not simple. Much depends on the underlying distribution, the corresponding 
speed of convergence and the accuracy one desires. There is a vast amount of liter- 
ature on the speed of convergence and error bounds. We will content ourselves with 
some examples. The reader is referred to Rohatgi [88] for a detailed discussion. 


302 


Fig. 1. (a) Distribution of X for Poisson RV with mean 3 and normal approximation; 
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0.8 


Approximation 


Exact density 


(b) 


(b) distribution of X for exponential RV with mean 1 and normal approximation. 


In the discrete case when the underlying distribution is integer valued, approxi- 
mation (4) is improved by applying the continuity correction. \f X is integer valued, 


then for integers x1, x2 


Pix, <X <x} = Pla — 4 < X <x. +4} 
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which amounts to making the discrete space of values of X continuous by consider- 
ing intervals of length 1 with midpoints at integers. 


Example 8. Let X;, X2,...,Xn be iid b(1, p) RVs. Then ES, = np, and 
var(S,) = np(1 — p), 80 (Sn — np)//np(i — p) is approximately (0, 1). 

Suppose that = 10, p = , Then from binomial tables, P(X < 4) = 0.3770. 
Using normal approximation without continuity correction, 


4—5 
P(X <4)xP (z < —) = P(Z < —0.63) = 0.2643. 
( V2.5 


Applying continuity correction, 
P(X <4) = P(X < 4.5) © P(Z < —0.32) = 0.3745. 


Next suppose that n = 100, p = 0.1. Then from binomial tables P(X = 7) = 
0.0889. Using normal approximation, without continuity correction, 


P(X =7) = P(6.0 < X < 8.0) © P(—1.33 < Z < —0.67) 
= 0.1596 
and with continuity correction 
P(X =7) = P(6.5 < X < 7.5) © P(-1.17 < Z < —0.83) 
= 0.0823 


The rule of thumb is to use continuity correction, and normal approximation when- 
ever np(1 — p) > 10, and Poisson approximation with 4 = np for p < 0.1,4 < 10. 


Example 9. Let X,, X2,... be iid P(A) RVs. Then S, has approximately an 
N (nd, nd) distribution for large n. Let n = 64, A = 0.125. Then S, ~ P(8), 
and from Poisson distribution tables, P(S, = 10) = 0.099. Using normal approxi- 
mation, 


P(S, = 10) = P(9.5 < S, < 10.5) © P(0.53 < Z < 0.88) 
= 0.1087. 


If n = 96, A = 0.125, then S, ~ P(12) and 
P(S, = 10) = 0.105, exact, 
P(S, = 10) ~ 0.1009, normal approximation. 


PROBLEMS 6.6 


1. Let {X,]} be a sequence of independent RVs with the following distributions. In 
each case, does the Lindeberg condition hold? 


304 


10. 


11. 
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(a) P{Xn = £(1/2")} = 3. 

(b) P{X, = +2"t!} = 1/2"*3, P{X, = 0} = 1— (1/2"*?). 

(c) P{X, = £1} = (1 —27")/2, P{X, = 42-7} = 1/2""!, 

(d) {X,} is a sequence of independent Poisson RVs with parameter A,,n = 
1,2,..., such that }"y_) Ax > 0. 


(e) P{X, = £2"} = $. 


. Let X1, X2,... be iid RVs with mean 0, variance 1, and EX? < oo. Find the 


limiting distribution of 


X 1X2 + X3X4 +--+ + Xn-1X2M 


Zn=Jn 


. Let X;, X2,... be iid RVs with mean @ and variance a2, and let Y;, Yo,... be 


iid RVs with mean B (# 0) and variance 2. Find the jimiting distribution of 
Zn = /n(Xn — &)/¥n, where X, =n! P_, X; and ¥, =n! _ Y. 


. Let X ~ b(n, @). Use the CLT to find n such that Po{X > n/2} > l-—a.In 


particular, let aw = 0.10 and 6 = 0.45. Calculate n, satisfying P{X > n/2} > 
0.90. 


. Let X), X2,... be a sequence of iid RVs with common mean jz and variance 


oa. Also, let X = n~! YY_, X, and S? = (n — 1)7! 1, (Xj — X)?. Show 
that ./n(X — u)/S — Z, where Z ~ N(O, 1). 


. Let Xy, X2,... , X1o9 be iid RVs with mean 75 and variance 225. Use Cheby- 


chev’s inequality to calculate the probability that the sample mean will not differ 
from the.population mean by more than 6. Then use the CLT to calculate the 
same probability, and compare your results. 


. Let X1, X2,..., X100 be iid P(A) RVs, where A = 0.02. Let S = Soo = 


oy X;. Use the central limit result to evaluate P{S > 3}, and compare your 
result to the exact probability of the event S > 3. 


. Let X1, X2,... , Xg, be iid RVs with mean 54 and variance 225. Use Cheby- 


chev’s inequality to find the possible difference between the sample mean and 
the population mean with a probability of at least 0.75. Also use the CLT to do 
the same. 


. Use the CLT applied to a Poisson RV to show that lity—.o0 e7”" Be, (nt)*/k! = 


Lfor0 <1 < 1,=4ift =1,andOifs > 1. 

Let X;, X2,... be a sequence of iid RVs with mean jz and variance o?, and as- 
sume that EX i < 00. Write V, = "7 (Xe- 1)”. Find the centering and norm- 
ing constants Ap and B,, such that Bo! (Vn — An) = Z, where Z is V(0, 1). 


From an um containing 10 identical balls numbered 0 through 9, 7 balls are 
drawn with replacement. 
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(a) What does the law of large numbers tell you about the appearance of 0’s in 
the n drawings? 

(b) How many drawings must be made in order that with probability at least 
0.95, the relative frequency of the occurrence of 0’s will be between 0.09 
and 0.11? 

(c) Use the CLT to find the probability that among the n numbers thus chosen, 
the number 5 will appear between (n — 3,/n)/10 and (n + 3,/n)/10 times 
(inclusive) if (i) n = 25, and (ii) n = 100. 

12. Let X1, X2,... , Xn be iid RVs with EX; = 0 and EX? = 0? < co. Let X = 

Yy<1 4i/n, and for any positive real number e, let Py,¢ = P{X > e}. Show that 


asn > &. 


[Hint: Use (5.3.61).] 


CHAPTER?7 


Sample Moments and 
Their Distributions 


7.1 INTRODUCTION 


In the preceding chapters we discussed fundamental ideas and techniques of prob- 
ability theory. In this development we created a mathematical model of a random 
experiment by associating with it a sample space in which random events corre- 
spond to sets of a certain o-field. The notion of probability defined on this o-field 
corresponds to the notion of uncertainty in the outcome on any performance of the 
random experiment. 

In this chapter we begin the study of some problems of mathematical statistics. 
The methods of probability theory learned in preceding chapters are used extensively 
in this study. Suppose that we seek information about some numerical characteristics 
of a collection of elements, called a population. For reasons of time or cost we may 
not wish or be able to study each element of the population. Our object is to draw 
conclusions about the unknown population characteristics on the basis of information 
on some characteristics of a suitably selected sample. Formally, let X be a random 
variable that describes the population under investigation, and let F be the DF of X. 
There are two possibilities. Either X has a DF Fg with a known functional form 
(except perhaps for the parameter 9, which may be a vector), or X has a DF F about 
which we know nothing (except perhaps that F is, say, absolutely continuous). In the 
former case let © be the set of possible values of the unknown parameter 0. Then 
the job of a statistician is to decide on the basis of a suitably selected sample which 
member or members of the family {Fg, 9 € ©} can represent the DF of X. Problems 
of this type, called problems of parametric statistical inference, are the subject of 
investigation in Chapters 8 through 12. The case in which nothing is known about the 
functional form of the DF F of X is clearly much more difficult. Inference problems 
of this type fall into the domain of nonparametric statistics and are discussed in 
Chapter 13. 

To be sure, the scope of statistical methods is much wider than the statistical infer- 
ence problems discussed in this book. Statisticians, for example, deal with problems 
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of planning and designing experiments, of collecting information, and of deciding 
how best the collected information should be used. However, here we concern our- 
selves only with the best methods of making inferences about probability distribu- 
tions. 

In Section 7.2 we introduce the notions of a (simple) random sample and sample 
statistics. In Section 7.3 we study sample moments and their exact distributions, and 
in Section 7.5 we consider their large-sample approximations. In Section 7.4 we con- 
sider some important distributions that arise in sampling from a normal population. 
Sections 7.6 and 7.7 are devoted to the study of sampling from univariate and bivari- 
ate normal distributions. 


7.2 RANDOM SAMPLING 


Consider a statistical experiment that culminates in outcomes x, which are the values 
assumed by an RV X. Let F be the DF of X. In practice, F will not be completely 
known; that is, one or more parameters associated with F will be unknown. The 
job of a statistician is to estimate these unknown parameters or to test the. validity 
of certain statements about them. She can obtain n independent observations on 
X. This means that she observes n values x1,x2,..- ,X, assumed by the RV X. 
Each x; can be regarded as the value assumed by an RV X;, i = 1,2,...,2, 
where X1, X2,..., Xn are independent RVs with common DF F. The observed 
values (x},%2,..-,X,) are then values assumed by (X), X2,..., Xn). The set 
{X1, X2,..., Xn} is then a sample of size n taken from a population distribution 
F. The set of n values x1, x2,... , X, is called a realization of the sample. Note that 
the possible values of the RV (Xj, X2,..., Xn) can be regarded as points in Rp, 
which may be called the sample space. In practice one observes not x1, x2,... , Xn 
but some function f (x1, x2,...,%,). Then f (x1, x2,...,%,) are values assumed 
by the RV f(X1, X2,..., Xn). 
Let us now formalize these concepts. 


Definition 1. Let X be an RV with DF F, and let X;, X2,... , X, beiid RVs with 
common DF F. Then the collection X;, X2,... , X, is known as a random sample 
of size n from the DF F or simply as n independent observations on X. 


If X;, X2,..., Xn is arandom sample from F, their joint DF is given by 
n 
(1) F*(x1,x2,...,4n) = [ | F@i). 
i=l 


Definition 2. Let X;, X2,... , Xn be n independent observations on an RV X, 
and let f: Rp, — Rx be a Borel-measurable function. Then the RV f(X1, X2,..., 
Xn) ts called a (sample) statistic provided that it is not a function of any unknown 
parameter(s). 
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Two of the most commonly used statistics are defined as follows. 


Definition 3. Let X;, X2,... , Xn be a random sample from a distribution func- 
tion F. Then the statistic 
nm 
a= X; 
(2) X=n's,=)°— 
i+ 7 
i=] 
is called the sample mean, and the statistic 
V2 2_,¥ 
@) goy ee 2 Dini X7 —nX 
n—1] n—-1 


i 


is called the sample variance, and S is called the sample standard deviation. 


Remark I. Whenever the word sample is used subsequently, it will mean ran- 
dom sample. 


Remark 2. Sampling from a probability distribution (Definition 1) is sometimes 
referred to as sampling from an infinite population since one can obtain samples of 
any size one desires even if the population is finite (by sampling with replacement). 


Remark 3. In sampling without replacement from a finite population, the inde- 
pendence condition of Definition 1 is not satisfied. Suppose that a sample of size 2 is 
taken from a finite population (a, a2, ... , ay) without replacement. Let X; be the 
outcome on the ith draw. Then P{X, = aj} = 1/N, P{X2 = a2 | Xi = aj} = 
1/(N — 1), and P{X2 = az | Xi = a2} = 0. Thus the PMF of X2 depends on 
the outcome of the first draw (that is, on the value of X;), and X, and X2 are not 
independent. Note, however, that 


N 
P{X2 = az} =) P{X = aj} P(X2 = a2 | aj} 
j=! 


= ST P(X) = aj}P{X. = a |aj) == 


j#2 Mt 
and X, Z X2. A similar argument can be used to show that X1, X2,..., Xy all 
have the same distribution but they are not independent. In fact, X;, X2,..., Xp are 


exchangeable RVs. Sampling without replacement from a finite population is often 
referred to as simple random sampling. 


Remark 4. It should be remembered that sample statistics x. Ss? (and others that 
we will define later) are random variables, while the population parameters 1, 07, 
and so on, are fixed constants that may be unknown. 
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Remark 5. In (3) we divide by n — 1 rather than n. The reason for this will 
become clear in the next section. 


Remark 6. Other frequently occurring examples of statistics are sample order 
statistics X(1), X(2), --- , X(n) and their functions, as well as sample moments, which 
will be studied in the next section. 


Example I, Let X ~ b(1, p), where p is possibly unknown. The DF of X is 
given by 


F(x) = pe(x — 1) + C — pye(x), xER. 


Suppose that five independent observations on X are 0, 1, 1, 1, 0. Then 0, 1, 1, 1, 
0 is a realization of the sample X1, X2,... , X5. The sample mean is 


O+1+14+14+0 _ 


0.6, 
5 


x= 
which is the value assumed by the RV X. The sample variance is 


= = = 03, 


= y (i — x)? ts 20.6)? + 3(0.4)? = 


which is the value assumed by the RV S*. Also s = /0.3 = 0.55. 


Example 2. Let X ~ N(u, 07), where yu is known but o? is unknown. Let 
X,, X2,...,Xn be a sample from N(u, o”). Then, according to our definition, 
>-"_, X;/o? is nota statistic. 

Suppose that five observations on X are —0.864, 0.561, 2.355, 0.582, —0.774. 
Then the sample mean is 0.372, and the sample variance is 1.648. 


PROBLEMS 7.2 


1. Let X bea dC, 5) RV, and consider all possible random samples of size 3 on X. 
Compute X and S? for each of the eight samples, and also compute the PMFs of 
X and S?. 


2. A die is rolled. Let X be the face value that turns up, and X1, X2 be two inde- 
pendent observations on X. Compute the PMF of X. 


3. Let X1, X2,... , Xn be a sample from some population. Show that 


_ —1 
max |X; — X| < SDS 
I<i<n Jn 
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unless either all the n observations are equal or exactly n — 1 of the X;’s are 
equal. (Samuelson [97]) 


4. Let x1, .x2,... ,X, be real numbers, and let x(n) = max{xj,x2,....%n}, Xa) = 
min{x1, x2,... ,X,}. Show that for any set of real numbers a1, a2,... , dn such 
that )“7_, a; = 0, the following inequality holds: 


n 
ait; 
i=] é 


5. For any set of real numbers x1, x2, ... , Xn, Show that the fraction of x1, x2, ... , Xn 
included in the interval (« — ks, x + ks) fork > 1 is at least 1 — 1/ k?. Here X 
is the mean and s the standard deviation of x’s. 


n 
< 4 (x@ — xa) 0 Iail- 
i={ 


7.3 SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS 


Let X1, X2,... , X, be asample from a population DF F. In this section we consider 
some commonly used sample characteristics and their distributions. 


Definition 1. Let Fi (x) = n7} ae =1 €(x — X;). Then nF} (x) is the number of 
X;’s (1 <k <n) that are < x. F¥(x) is called the sample (or empirical) distribution 
function. 


We note that 0 < F*(x) < 1 for all x, and moreover, that F* is right continuous, 
nondecreasing, and F*(—0oo) = 0, F* (co) = 1. Thus FF is a DF. 
If Xa), Xa), ... , Xq@p is the order statistic for X;, X2,... , Xn, then clearly 
ifx < X() 
if Xq) <x < X«KE41) (k= 1,2,...,n—1) 
ifx > Xqy- 


Q) Fr) = 


ti oe) 


For fixed but otherwise arbitrary x € R, F;* (x) itself is an RV of the discrete type. 
The following result is immediate. 


Theorem 1. The RV F(x) has the probability function 


(2) P {Fre = Z| = (")iFeovn =FO)I YY. fF =OApgcan, 
with mean 
(3) EF;(x) = F(x) 


and variance 
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~F 
(4) var(Ff()) = TOE FON 


Proof. Since e(x — Xj), j = 1,2,...,n, are tid RVs, each with PMF 
P{e(x — Xj) = 1} = P{x — X; = O} = F(x) 
and 
P{e(x — Xj) = 0} = 1— FR), 


their sum n F(x) isa b(n, p) RV, where p = F(x). Relations (2), (3), and (4) follow 
immediately. 


Corollary 1. For each x € R, 
P 
F(x) > F(x) asn —> OOo. 


Corollary 2. For each x € R, 


SOLER EBON ipsa, 
FOI — F@)] 


where Z is N'(O, 1). 


Corollary 1 follows from the WLLN and Corollary 2 from the CLT. The con- 
vergence in Corollary 1 is for each value of x. It is possible to make a probability 
statement simultaneously for all x. We state the result without proof. 


Theorem 2 (Glivenko—Cantelli Theorem). F*(x) converges uniformly to 
F(x), that is, for e > 0, 


lim P sup |F7(x)— F(x)| > e = 0. 
n—co 


—O00O<X<0O 


For a proof of Theorem 2, we refer to Fisz [28, p. 391]. 

We next consider some typical values of the DF F(x), called sample statistics. 
Since F(x) has jump points X;, j = 1,2,...,n, it is clear that all moments of 
F* (x) exist. Let us write 


(5) a =n"! Da xi 
j=l 


for the moment of order k about 0. Here a, will be called the sample moment of 
order k. In this notation 
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n — 
(6) apn Yo xy SX. 
j=l 
The sample central moment is defined by 
n n = 

(7) by =n" (Xj -ak =n! Sx; — Xh. 

j=l j=l 


Clearly, 


n—Il 


by =0 and b= S?. 


As mentioned earlier, we do not call bz the sample variance. S? will be referred to as 
the sample variance, for reasons that will subsequently become clear. We have 


(8) bz = an —- ae: 
For the MGF of DF F* (x), we have 
n 
(9) M*(th=n7! Soe, 
j=l 


Similar definitions are made for sample moments of bivariate and multivariate 
distributions. For example, if (X1, Y1), (X2, Y2),... , (Xn, Yn) is a sample from a 
bivariate distribution, we write 


n n 
(10) Xan'!)°x;, and Y=n')oy; 
for the two sample means, and for the second-order sample central moments we write 


n nt 
(11) boy =n! S\(Xj;—X), bo =n! So; -Y¥)?, and 
j=l j=l 


n 
by =n!) (xX; —X)(¥j —¥). 
j=l 
Once again we write 
n im n -. 

(12) s? = (n — 1)7! xe ¢ _ x) and S} =(n—- ij7? yy Lo Y)? 

j=l j=l 
for the two sample variances, and for the sample covariances we use the quantity 


(13) Sy = 17! 0K; — XH); - ¥). 
j=l 
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In particular, the sample correlation coefficient is defined by 


(14) R= =. 


It can be shown (Problem 4) that |R| < 1; the extreme values +1 can occur only 
when all sample points (X1, Yi), .-. , (Xn, ¥,) lie on a straight line. 

The sample quantiles are defined in a similar manner. Thus, if 0 < p < 1, the 
sample quantile of order p, denoted by Zp, is the order statistic X(,), where 


np if np is an integer, 

{np + 1] if np is not an integer. 
As usual, [x] is the largest integer < x. Note that if np is an integer, we can take any 
value between X (np) and X(np)+1 as the pth sample quantile. Thus, if p = 5 and n is 
even, we can take any value between X (2) and X(n/2)+1, the two middle values, as 
the median. It is customary to take the average. Thus the sample median is defined 
as 


X(m4+)/2) if n is odd, 
aa SUE Xeoja) + Xn/y40) 2) + Xn/2+1) if n is even. 
2 
Note that 
Easy 
if n is odd. 


Example 1. A random sample of 25 observations is taken from the interval (0,1): 


0.50 0.24 0.89 0.54 0.34 0.89 0.92 0.17 0.32 0.80 
0.06 0.21 0.58 0.07 0.56 0.20 0.31 0.17 0.41 0.38 
0.88 0.61 0.35 0.06 0.90 


In order to compute Fs» the first step is to order the observations from smallest to 
largest. The ordered sample is 


0.06, 0.06, 0.07, 0.17, 0.17, 0.20, 0.21, 0.24, 0.31, 0.32, 
0.34, 0.35, 0.38, 0.41, 0.50, 0.54, 0.56, 0.58, 0.61, 0.80, 
0.88, 0.89, 0.89, 0.90, 0.92 


Then the empirical DF is given by 
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0 0.2 0.4 0.6 0.8 1 


Fig. 1. Empirical DF for data of Example 1. 


0, x < 0.06 

2/25, 0.06<x <0.07 

3/25, 0.007 <x <0.17 

Fig(x) = 4 5/25, 0.17 <x < 0.20 


24/25, 0.90 <x < 0.92 
1, x > 0.92 


A plot of F3, is shown in Fig. 1. The sample mean and variance are 
¥=0.45, s?=0.084, and s=0.29. 


Also, sample median is the 13th observation in the ordered sample, namely, z1/2 = 
0.38, and if p = 0.2, then np = 5 and z2 = 0.17. 


Next we consider the moments of sample characteristics. In the following we 
write EX* = m, and E(X — uw)‘ = py for the kth-order population moments. 
Whenever we use mx (or ,2x), it will be assumed to exist. Also, a2 represents the 
population variance. 


Theorem 3. Let X;, X2,..., Xp be a sample from a population with DF F. 
Then 
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(16) EX =n, 
= o2 
(17) var(X) = —, 
n 
3 3 +3(n—1)m2zp + H— Yn — 2)3 
(18) E(XP = aR, aa eT 
and 


mg + 4(n — 1)m3y + 6(n — 1)(n — 2)mgp? + 3(n — Im5 
n> 
4 @=Da= Den = Hut 


n> 


(19) E(X)* = 


Proof. In view of Theorems 4.5.3 and 4.5.7, it suffices to prove (18) and (19). 
We have 


% 
(Ss) =x Dd X/XeX, 
=I Fk jxAl 
and (18) follows. Similarly, 
e 4 
(3x) es (3 x) (9 +3) XX + DO xmxi) 
i=l i=1 i#k j#kAl 
= SOxP +430 x)XP 43) 0 X7XP4+6 D> XPX Xe 
i=l iFk i#k if#h 
+ So XiXjXeX, 
if jFk#l 


and (19) follows. 


Theorem 4. For the third and fourth central moments of X, we have 


(20) p3(X) = > 


and 


—_ n— 
(21) ja) = 3 
n n 
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Proof. We have 
1 n : 
u3(X) = E(X— py = GE px - | 


i< 3 3 
= 5 EX — wy = 5, 


i=] 


and 
1 p i 
pa(X) = E(X — p)* = Prag pa = | 


i< 4\ 1 

=D EK ~ w+ () = ELK: — w(K — wT 
i=1 i<j 

_ us 3-1) » 


gaa e : 
n3 woe 


Theorem 5. For the moments of b7, we have 


~ 2 
(22) Ey) =O, 
wa-pe Aug —2u3) wa — 3h 
(23) var(b2) = ————~* — aT, Toca: + 
n n n 
—Dn-2 
24) bs) = SPs, 
and 
= 222 a eis 
(25) E(ba) = (n—-1)(n 3n + 3) Wer 3(n — 1)(QQn — 3) 2. 


n3 n3 


oe ee rt ee Ne 
= 1e[ Sx wu)? —n(X “| 


f=1 


n—-1 
o?. 


-_ =(no* _ a”) — 
n n 
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Now 


= 2 
Wb} = pat — wy? ~ n(X — a 


i=] 


Writing Y; = X; — wu, we see that EY; = 0, var(Yj) = o7, and EY; = p4. We have 
z 2 
nEBS = e(X iF) 
1 
A 4 2y2_ 2 2y2 , Wy4 
=E Ly +> ¥?¥} == Belgas 
is j= 


iFj iFj 
pe 35 2? +o vs 
n2 iZj od 1 J ; 
ifj 


It follows that 
2p 4_2 4 I * 
n° Eby = nya +n(n — l)o” — pe a! +npal + Gn — 1)o* + npt4] 
1 3 2 2 
= n—-2+— bat mee (n—V)up (U2 = 0°). 


Therefore, 


var(b2) = Eb? — (Eb)? 
1 3\ 3 =1\7 
=(n-24 2) Ba —y(n-242)9-(5 ) 2 
n n n n n 


1 2 
= (n= 242) B+ o-m. 
n n n 


as asserted. 
Relations (24) and (25) can be proved similarly. 


Corollary 1. ES? =o”. 
This is precisely the reason why we call S?, and not b2, the sample variance. 


3 — 
Corollary 2. var(S*) = Loa pea ih: 
n n(n — 1) 


Remark 1. The results of Theorems 3 to 5 can easily be modified and stated for 
the case when the X;’s are exchangeable RVs. Thus (16) holds and (17) has to be 
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modified to 


~1 
n po? 


= o2 
(17’) var(X) = — + 
n n 


where p is the correlation coefficient between X; and X;. The expressions for 
(xX iy and (XX J in the proof of Theorem 3 still hold, but both (18) and (19) 
need appropriate modification. For example, (18) changes to 


m3 + 3(n — 1) E(X7 Xx) + (n — In — 2E(XjXxX) 


n2 


(18’) EX: = 


Let us show how Corollary 1 changes for exchangeable RVs. Clearly, 


(n — 1)S? = YOK; — w)? — (XK — pw)? 


i=! 


so that 
(n —1)ES*? = no? —nE(X — yp)? 
=no* — [>? +(n- )po?| : 
in view of (17’). It follows that 
ES? =o07(\— p). 


We note that E(S? — 0”) = —po”, and moreover, from Problem 4.5.19 [or from 
(17')] we note that p > —1/(n — 1), so that 1 — p < n/(n — 1), and hence 


0< ES? < - a2. 
= ~n—-I 


Remark 2. In simple random sampling from a (finite) population of size N, we 
note that when n = N, X = p, which is a constant, so that (17’) reduces to 


so that p = —1/(N — 1). It follows that 


N—no? 


N—1n- 


2 
- oO n-1 

y ee west Stn 2 Sooo Py es 
The factor (N —n)/(N — 1) in (17”) is called the finite population correction factor. 
As N — on, withn fixed, (N —n)/(N —1) — 1, so that the expression from var(X) 
in (17”) approaches that in (17). 
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The following result provides a justification for our definition of sample covari- 
ance. 


Theorem 6. Let (X1, ¥1), (X2, Yo), ... , (Xn, Yn) be a sample from a bivariate 
population with variances Cee of and covariance 00102. Then 


(26) ESt=o)7, ES}=03, and ES\) =poi0, 
where S?, S3, and Sj; are defined in (12) and (13). 
Proof. \t follows from Corollary 1 to Theorem 5 that E Ss? = a? and E SF = of. 


To prove that ES; = o102, we note that X; is independent of X;(i # j) and 
Y; i A j). We have 


(n—1)ES1, = E pate — X)(¥;- »| 


j=l 
Now 
E((X;-X)(¥j -YI=E [x0 ae x, ets = y) AX + 2 Aipeti] 
= E(XY)— “[E(XY) +(n—1I)EXEY]— “[E(XY) + (n — 1)EXEY] 
af SyInE(XY) +n(n—1)EXEY) 
n 


ees 1 E(XY) — EXEY] 
n 


and it follows that 


n—- 


(n —1)ES\,; =n | E(XY) — EXEY], 


n 


that is, 
ES\,; = E(XY) — EXEY =cov(X, Y) = pojon, 
as asserted. 


We next turn our attention to the distributions of sample characteristics. Several 
possibilities exist. If the exact sampling distribution is required, the method of trans- 
formation described in Section 4.4 can be used. Sometimes the technique of MGF or 
CF can be applied. Thus, if X,, X2,..., X, is a random sample from a population 
distribution for which the MGF exists, the MGF of the sample mean X is given by 


320 SAMPLE MOMENTS AND THEIR DISTRIBUTIONS 
n t n 
(27) My(t) =|] Ee*/" = [™ (5) 
5 n 
i=} 


where M is the MGF of the population distribution. If My(t) has one of the known 
forms, it is possible to write the PDF of X. Although this method has the obvious 
drawback that it applies only to distributions for which all moments exist, we will 
see in Section 7.6 its effectiveness in the important case of sampling from a normal 
population where this condition is satisfied. An analog of (27) holds for CFs without 
any condition on existence of moments. Indeed, 


mene aonn fT 


j=l 
where @ is the CF of X;. 


Example 2. Let X1, X2,...,Xn be a sample from a G(a, 1) distribution. We 
will compute the PDF of X. We have 


mz =[m (£) Rite, SE 
a n ~ d—t/njer’ moe 


so that X is a G(an, 1/n) variate. 


Example 3. Let X1, X2,... , Xn be arandom sample from a uniform distribution 
on (0, 1). Consider the geometric mean 


a l/n 
Y, = (I x) j 
i=] 


We have log Y,, = (1/n) pee log X;, so that log Y,, is the mean of log X1,... , log Xn. 


The common PDF of log X;,... , log Xn is 
e ifx <0, 
EON ‘ otherwise, 


which is the negative exponential distribution with parameter B = 1. We see that the 
MGE of log Y, is given by 


1 


n 
ae E tlog Xi/n _ ean nue cases 
mw =[|ze (1 +t/ny" 


i=] 


and the PDF of log Y,, is given by 
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n® 


f*(x) = 4 T@) 


0, otherwise. 


(—x)*!e™, —oo <x <0, 


It follows that Y,, has PDF 


n 
nly \2—1 
hos toy OS re 


0, otherwise. 


Example 4 (Hogben [43]). Let X1, X2,...,Xn be a random sample from 
a Bernoulli distribution with parameter p, 0 < p < 1. Let X be the sam- 
ple mean and S? the sample variance. We will find the PMF of S?. Note that 
Sn = Y.1 Xi = WL, X? and that S, is b(n, p). Since 


(n — 1)8? = 9° xX? —n(xy? 


i=] 


_ Sn(t ~ Sn) 
n 2 

S? only assumes values of the form 

i(n —i) n 

t= ———_., i=0,1,2,...,1=], 

n(n — 1) : 5] 

where [x] is the largest integer < x. Thus 
2 2, , ny? . ny? 
P{S? = 1} = P{nS, — S? =i(n—-))} = P (51-5 ~(i-3) 


= P{S, =i or S, =n — i} 


= (i)e'a — pyr i+ (“era ~ py 


— {") p66 — pied — nyr-2i n~2i . n 
=(F)r'a Py = py ™ +p", i <[ FI]. 


If 1 is even, n = 2m, say, where m > 0 is an integer, and i = m, then 


Po af) ie ie 
a ier oe hats OO ‘ 


In particular, if n = 7, S? = 0, 4, 3, and 2 with probabilities {p’ + (1 — p)7), 


Tp(1—p){p?+(1—p)>}, 21 p2(1—p)*{p? +1 — p)}, and 35 p3(1 — p), respectively. 
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If n = 6, then S? = 0, 4, 4, and ¥, with probabilities {p® + (1 — p)%}, 6p — 
p){p* + (1 — p)*}, 15p7(1 — p)*{p? + (1 — p)*}, and 40p°(1 — p)?, respectively. 


We have already considered the distribution of the sample quantiles in Section 4.7 
and the distribution of range X(,) — X14) in Example 4.7.4. It can be shown without 
much difficulty that the distribution of the sample median is given by 


pee Ne r—lypy n~-r wp +1 
r—1)i(n— pilFon] [§h- FOP" fQ) ifr = Sar 


(29) f-Q)= ( 
where F and f are the population DF and PDF, respectively. If n = 2m and the 
median is taken as the average of X¢m) and Xm4+1), then 


2(2m)! 


(30) f-Q) = im — Die 


/ [LF (2y — v)¥"" [1 — FY"! fy — v) fv) do. 
y 


Example 5. Let X\, X2,..., Xn be a random sample from U(0, 1). Then the 
integrand in (30) is positive for the intersection of the regions 0 < 2y — v < 1 and 
0 < v < 1. This gives v/2 < y < (v+ 1)/2, y < v, and 0 < v < 1. The shaded 
area in Fig. 2 gives the limits on the integral as 


y<u<i2y ifO<y<i, 


and 
y<u<l if}<y<l. 


y=vi2 


> 
i 
| 
XN 


eae nie = 
be 


Va 
0 1 2 > 


0 ! 1 y 


Fig. 2. {y <u <2y,0<y<4,andy<v<1l,}<y<l}. 
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In particular, if m = 2, the PDF of the median, (X(2) + X3)/2, is given by 


8y2(3 ~ 4y) if0<y <4, 
fr(y) = { 8(4y? — 9y? + 6y — 1) if 4 <y<l, 
0 otherwise. 


In Section 7.5 we study large-sample theory techniques to approximate distribu- 
tions of sample statistics when n is large. 


PROBLEMS 7.3 


1. Let X1, X2,... , X_ be random sample from a DF F, and let F*(x) be the sam- 
ple distribution function. Find cov(F*(x), F(y)) for fixed real numbers x, y. 


2. Let F* be the empirical DF of a random sample from DF F. Show that 
Phirtay)—F@|>—el<4+  toralle > 0. 
n —~ 2./n} 7 & 


3. For the data of Example 7.2.2, compute the sample distribution function. 


4. (a) Show that the sample correlation coefficient R satisfies |R| < 1 with equal- 
ity if and only if all sample points lie on a straight line. 
(b) If we write U; = aX; + b (a £ 0) and V; = c¥; +d (c 4 0), what is the 
sample correlation coefficient between the U’s and the V’s? 


5. (a) A sample of size 2 is taken from the PDF f(x) = 1,0 <x < 1,and=0 
otherwise. Find P(X > 0.9). 
(b) A sample of size 2 is taken from b(1, p). Find (i) P(X < p), and (ii) 
P(S? > 0.5). 


6. Let X1, X2,...,Xn be arandom sample from N (1, o2). Compute the first four 
sample moments of X about the origin and about the mean. Also compute the 
first four sample moments of S? about the mean. 


7. Derive the PDF of the median given in (29) and (30). 


8. Let Ua), U2), .-. , Ucn) be the order statistics of a sample size n from U (0, 1). 
Compute E Ui for any 1 <r <7 and integer k (> 0). In particular, show that 


- 1 
and var(Uy)) = ner?) 


PO es (n+ D2(n +2) 


Show also that the correlation coefficient between Ui) and Us) for 1 <r < 
s < nis given by [r(n —s + 1)/s(n —r + 1)]'”. 
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9. Let X1, X2,..., Xn be nm independent observations on X. Find the sampling 
distribution of X, the sample mean, if (a) X ~ P(A), (b) X ~ C(I, 0), and 
(c) X ~ x7(m). 


10. Let X;, X2,... , X, be arandom sample from G(a, B). Let us write Y, = (X — 
aB)/B/a/n,n =1,2,.... 


(a) Compute the first four moments of Y,,, and compare them with the first four 
moments of the standard normal distribution. 


(b) Compute the coefficients of skewness a3 and of kurtosis a4 for the RVs Y,,. 
(For definitions of a3, a4, see Problem 3.2.10.) 


11. Let X), X2,...,X, be a random sample from U[0, 1]. Also let Z, = (X — 
0.5)/./1/12n. Repeat Problem 10 for the sequence Z,. 


12. Let X1, X2,.. ., Xp, bea random sample from P(A). Find var(S2), and compare 
it with var(X). Note that EX = 4 = ES?. (Hint: Use Problem 3.2.9.) 


13. Prove (24) and (25). 


14. Multiple RVs X1, Xz, ... , X, are exchangeable if the n! permutations (X;, , X;,, 

., X;,) have the same multidimensional distribution. Consider the special case 

when X’s are two-dimensional. Find an analog of Theorem 6 for exchangeable 
bivariate RVs (X,, Y1), (Xo, Yo), ... . (Xn, Yn). 


7.4 CHI-SQUARE, t-, AND F-DISTRIBUTIONS: EXACT 
SAMPLING DISTRIBUTIONS 


In this section we investigate certain distributions that arise in sampling from a nor- 
mal population. Let X1, X2,... , X, be asample from V(p, a”). Then we know that 
X ~ N(p, o?/n). Also, {./n (X — )/o}? is x2(1). We determine the distribution 
of S? in the next section. Here we define mainly chi-square, t-, and F-distributions 
and study their properties. Their importance will become evident in the next section 
and later in the testing of statistical hypotheses (Chapter 10). 

The first distribution of interest is the chi-square distribution, defined in Chapter 5 
as a special case of the gamma distribution. Let n > 0 be an integer. Then G(n/2, 2) 
is a x2(n) RV. In view of Theorem 5.3.29 and Corollary 2 to Theorem 5.3.4, the 
following result holds. 


Theorem 1. Let X1, X2,... , Xn be iid RVs, and let S, = }°p_, Xx. Then 


(a) Sn ~ x(n) & Xy ~ x7(1), and 


(b) X1 ~ NO, 1) => D> XP ~ x7(n). 


k=] 
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If X has a chi-square distribution with n d.f., we write X ~ x?(n). We recall that 
if X ~ x(n), its PDF is given by 


x2/2-1 eat/2 

— ifx > 0, 
(1) f(x) = 4 2°27 (/2) 

0 ifx <0, 
the MGF by 
(2) M(t)= (1-21)? fort < $, 
and the mean and the variance by 
(3) EX =n, and var(X) =2n. 


The x2(n) distribution is tabulated for values of n = 1,2,.... Tables usually go 
up to n = 30, since for n > 30 it is possible to use normal approximation. In Fig. 1 
we plot the PDF (1) for selected values of n. 

We will write Wes for the upper @ percent point of the x7(n) distribution, that is, 


(4) P{xX?(n) > Xie) =e. 


Table ST3 at the end of the book gives the values of xe for some selected values of 
nand a. 


0 10 20 6300640 50 =: 60 70 


Fig. 1. Chi-square densities. 
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Example 1. Let n = 25. Then, from Table ST3, 
P{x?(25) < 34.382} = 0.90. 


Let us approximate this probability using CLT. We see that Ex?(25) = 25, 
var x2(25) = 50, so that 


2(25)—25 34.382 ~—2 
BOS aah eo pp LE 2g BS 
/50 5/2 
m P{Z < 1.32} 
= 0.9066. 

Definition 1. Let X,, X2,... , Xn be independent normal RVs with EX; = yj; 
and var(X;) = 07, i = 1,2,...,. Also, let ¥Y = “7, X?/o?. The RV Y is said 
to be a noncentral chi-square RV with noncentrality parameter )~; yu? /o? and n 
d.f. We will write Y ~ x?(n, 8), where 6 = S77_, u?/o?. 


Although the PDF of a x7(n, 5) RV is hard to compute (see Problem 16), its MGF 
is easily evaluated. We have 


n 
M(t) = Eel U1 ¥X3/0? — TT] gelXi/e? 
1 


where X; ~ N(j4;, 07). Thus 


2 
EebtXi/o? = a J exp E = eon | dx;, 


oo o V2 o2 202 


where the integral exists for t < 5. In the integrand we complete squares, and after 
some simple algebra we obtain 


1 ip? 1 
EetXi/o? — eee cs res pes, 
" Tom | 2G —-2) <5 
It follows that 
2 
Zs —n/2 YM; = 
(5) M(t) =(1 — 21)” 00 (5 =) ) t<-, 


and the MGF of a 7M, 5) RV is therefore 


(6) M(t) =(1— 2" exp (; +8) , t< > 
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It is immediate that if Y1, ¥2,...,¥¢ are independent, ¥; ~ x7(n;,6;),i = 


1,2,...,k, then 3“, is x2(0, ni, DE, 8). 
The mean and variance of x“(n, 5) are easy to calculate. We have 


VV EX? _ Vilvar(Xi) + (EXi)") 


EY¥==5 = 


and 


var(Y) = var vi Xi i)- 2 5| Som | 


al 
1 n 
=3 b Ex? = dieexy | 


1 
74 


Q 


n an 
ee + 607 u? + ui) — (7 + | 


i= i=1 
= 4 ln + 40? yu?) =2n+46. 


We next turn our attention to Student's t-statistic, which arises quite naturally in 
sampling from a normal population. 


Definition 2. Let X ~ NV(O, 1) and ¥Y ~ x2(n), and let X and Y be independent. 
Then the statistic 

Xx 
VY/n 


is said to have a t-distribution with n d.f. and we write T ~ t(n). 


(7) T= 


Theorem 2. The PDF of T defined in (7) is given by 


r[(n + 1)/2] 
T(n/2)/nw 


The proof is left as an exercise. 


(+22 /ny-@td?, —0o <t <0. 


(8) fn(t) = 


Remark 1. Forn = 1, T is a Cauchy RV. We will therefore assume that n > 1. 
For each n, we have a different PDF. In Fig. 2 we plot f,(¢) for some selected values 
of n. Like the normal distribution, the t-distribution is important in the theory of 
statistics and hence is tabulated (Table ST4). 
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Fig. 2. Student’s t-densities. 


Remark 2. The PDF f,,(t) is symmetric in t, and f,(f) ~ Oast — +00. 
For large n, the t-distribution is close to the normal distribution. Indeed, (1 + 
2 /n)~@rD/2 _, e-"/2 as n — 00. Moreover, as t > 00 or t > —oo, the tails of 
fn(t) > 0 much more slowly than do the tails of the (0, 1) PDF. Thus for small n 
and large fo, 


P{|T] > to} = P{|Z| > to}, Z~N@, 1); 


that is, there is more probability in the tail of the t-distribution than in the tail of the 
standard normal. In what follows we write ty ,«/2 for the value (Fig. 3) of T for which 


(9) P{|T| > tra/2} = @. 


In Table ST4 positive values of t,,. are tabulated for selected values of n and a. 
Negative values may be obtained from symmetry, t,1-0 = —tn,a- 


Example 2. Let n = 5. Then from Table ST4, we get t5 9.025 = 2.571 and 
t5,0,05 = 2.015. The corresponding values under the N(0, 1) distribution are zo,025 = 
1.96 and zo.95 = 1.65. For n = 30, 


t30,0.05 = 1.697 and Z0.05 = 1.65. 


Theorem 3. Let X -~ t(n),n > 1. Then EX’ exists for r < n. In particular, if 
r < nis odd, 


(10) Ex’ =0, 
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t(n) 
a/2 a/2 
—th a2 0 th, a/2 
Fig. 3. 
and if r < 7 is even, 
(11) EX’ = pr Pie + 1)/21T[@ — r)/2] 


T1/2)P @/2) 
Corollary. If n > 2, EX = O and EX? = var(X) =n/(n — 2). 
Remark 3. If in Definition 2 we take X ~ N (uz, 0), ¥/a? ~ x(n), and X and 
Y independent, 
x 
VY/n 


is said to have a noncentral t-distribution with parameter (also called noncentrality 
parameter) 5 = y/o and d.f. n. Various moments of noncentral t-distribution may 
be computed by using the fact that expectation of a product of independent.RVs is 
the product of their expectations. 


T= 


We leave the reader to show (Problem 3) that if T has a noncentral t-distribution 
with n d.f. and noncentrality parameter 6, then 


Pa — 1/2) fa 


(12) ET =5 : 
T(n/2) V2 


n>, 
and 


2 2 = 2 
ay: Raney: Ms (a) ce. 


n-2 2 T(n/2) 
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Definition 3. Let X and Y be independent x? RVs with m and n d.f., respectively. 
The RV 
X/m 


(14) re aF 


is said to have an F-distribution with (m, n) d.f., and we write F ~ F(m,n). 


Theorem 4. The PDF of the F-statistic defined in (14) is given by 


[im + n)/2] (") C ee 


P(m/2)P(n/2) \n) \n 


_ m —(m+n)/2 
(15) a(f) (1474) f>0, 
n 
0, f <9. 
The proof is left as an exercise. 


Remark 4. If X ~ F(m,n), then 1/X ~ F(n,m). If we take m = 1, then 
F = [t(n)/*, so that F(1,n) and t?(n) have the same distribution. It also follows 
that if Z is C(1, 0) [which is the same as 1(1)], Z? is F(1, 1). 


Remark 5. As usual, we write Finn for the upper @ percent point of the 
F(m, n) distribution, that is, 


(16) P{F(m,n) > Fm.n.a} = @. 


From Remark 4, we have the following relation: 


1 


Fraim,o 


(17) Finn,l-a = 


It therefore suffices to tabulate values of F that are > 1. This is done in Table STS, 
where values of Fin.n,q are listed for selected values of m, n, and a. See Fig. 4 for a 


plot of g(f). 


Theorem 5, Let X ~ F(m,n). Then, for k > 0, integral, 


«(METI + On/2)1P[(/2) — K) 
(18) EX‘ = (<) seo PYO forn > 2k. 
In particular, 
(19) re ees ; n> 2, 

n—-2 
and 
2 - 

(20) moe oe n> 4. 


m(n — 2)2(n — 4)’ 
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0 1 2 3 4 5 6 7 8 
Fig. 4. F densities. 


Proof. We have for a positive integer k, 


oo m \—(m+n)/2 
(21) [ ph pm ( + “f) DF 
0 n 
i 
= qyr" f gkHOm/2)- yy y(/2D—k-A gy 
n 0 


where we have changed the variable to x = (m/n) f{1 + (m/n) f 17!. The integral 
in the right side of (21) converges for (n/2) — k > O and diverges for (7/2) —k < 0. 
We have 


ext = een (By (Sac e 2-8) 


as asserted. 
For k = 1 we get 


ee i 
a> at ee n> 2. 
Also, 
2 (RY? _lem/2Nm/2) +1) 
ee =(5) {(n/2) — 1][(n/2) — 27° ot 


2 2 
(-) we at 
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and 


gn 2 m(m+2) n 2 
ae (z) (n—2n—4) € a 5) 


2n*(m +n — 2) 
= —__,——_—_—_, n> 4. 
m(n — 2)2(n — 4) 
Theorem 6. If X ~ F(m,n), then Y = 1/[1 + (n/n)X] is B(n/2, m/2). Con- 
sequently, for each x > 0, 


1 
Fx (x) =l1- Fy laa | . 


If in Definition 3 we take X to be a noncentral x7 RV with n d.f. and noncentrality 
parameter 5, we get a noncentral F RV. 


Definition 4. Let X ~ x2(m, 5) and Y ~ x(1), and let X and Y be independent. 
Then the RV 
X/m 


2) = Y/n 


is said to have a noncentral F-distribution with (m, n) d.f. and noncentrality param- 
eter 8. 


It is shown in Problem 2 that if F has a noncentral F-distribution with (m, n) df. 
and noncentrality parameter 6, 


om n(m +4) 


BP = Gn) n> 2, 
and 
2n? 2 
PROBLEMS 7.4 
1. Let 
-1 f*® 
Py = \P (5) ae i wo D/2e-@!2 day, x>0. 
0 
Show that 
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be 


Let X ~ F(m,n, 98). Find EX and var(X). 


3. Let T be a noncentral t-statistic with n d.f. and noncentrality parameter 6. Find 


ET and var(T). 
Let F ~ F(m,n). Then 


~1 
y=(1+ PF) ~B(S *). 
n 
Deduce that for x > 0, 


pir sxj=i—Pfy<(14 x) ‘I. 


5. Derive the PDF of an F-statistic with (m, n) d.f. 


Show that the square of a noncentral t-statistic is a noncentral F-statistic. 


7. A sample of size 16 showed a variance of 5.76. Find c such that P{\X — p| < 


c} = 0.95, where X is the sample mean and y is the population mean. Assume 
that the sample comes from a normal population. 


A sample from a normal population produced variance 4.0. Find the size of the 
sample if the sample mean deviates from the population mean by no more than 
2.0 with a probability of at least 0.95. 


Let X1, X2, X3, X4, X5 be a sample from WV (0, 4). Find P(X? x? > 5.75}. 


. Let X ~ x7(61). Find P{X > 50}. 


. Let F ~ F(m,n). The random variable Z = 5 log F is known as Fisher’s 


Z-statistic. Find the PDF of Z. 


Prove Theorem 1. 


. Prove Theorem 2. 


. Prove Theorem 3. 


Prove Theorem 4. 


(a) Let fi, f2,... be PDFs with corresponding MGFs M,, M2, ..., respec- 
tively. Let a; (0 < a; < 1) be constants such that }°72,;a; = 1. Then 


f= VP a; fj is a PDF with MGF M = a aj;M;. 
(b) Write the MGF of a x2(n, 5) RV in (6) as 
oo 
M(t) =) ajMj(t) 
j=0 


where M(t) = (1 — 2t)~@J+”/2 is the MGF of a x2(2j +) RV and 
dj . J 
a; = e~°/2(8/2)/ /j1 is the PMF of a P(5/2) RV. Conclude that PDF of Y ~ 


334 SAMPLE MOMENTS AND THEIR DISTRIBUTIONS 


x2(n, 5) is the weighted sum of PDFs of x2(2j +n) RVs, j = 0,1,2,... 
with Poisson weights and hence 


00 4-8/2(5/9)i y(2j+n)/2-1 2599 
fro=>s5 (6/2)/ y exp(—y/2) 


HL 2AM TQ; +0)/27 


7.5 LARGE-SAMPLE THEORY 


In many applications of probability one needs the distribution of a statistic or some 
function of it. The methods of Section 7.3 when applicable lead to the exact distri- 
bution of the statistic under consideration. If not, it may be sufficient to approximate 
this distribution provided that the sample size is-large enough. 

Let {X,} be a sequence of RVs that converges in law to N(w, 0). Then {(X, — 
}4)/o)} converges in law to N(0, 1), and conversely. We will say alternatively and 
equivalently that {X,,} is asymptotically normal with mean py and variance o?. More 
generally, we say that X,, is asymptotically normal with “mean” j4,, and “variance” 
o7, and write X, is AN(Un, 02), if on > 0 and as n —> 00, 

Xn — En 


(1) a OO Ty: 
on 


Here jZy is not necessarily the mean of X, and o,, not necessarily its variance. In this 
case we can approximate, for sufficiently largen, P{X, < t}by P{Z < (t—un)/on} 
where Z is N(0, 1). 

The most common method to show that X, is AN(in, a?) is the central limit the- 
orem of Section 6.6. Thus, according to Theorem 6.6.1, /n(Xn — 1) Pe (0, 0”) 
as n —> 00, where X,, is the sample mean of n iid RVs with mean jz and variance 
o”. The same result applies to the kth sample moment, provided that E|X|?* < oo. 
Thus 


n 


nm xk xk 
> is AN (zx _ *)., 


j=l 


In many large-sample approximations an application of the CLT along with Slutsky’s 
theorem suffices. 


Example 1. Let X;, X2,... be iid N(u, o*). Consider the RV 
ae J/n(X — 1) 
eS ra ae ne 
S 
The statistic 7, is well known for its applications in statistics and in Section 7.6 


we determine its exact distribution. From Example 6.3.4, (n — 1)S?/n 6 and 
hence S/o ~, 1. Since Vn(X — p)/o _§, Z ~ N(O, 1), it follows from Slutsky’s 


LARGE-SAMPLE THEORY 335 


theorem that T,, , Z. Thus for sufficiently large n (n > 30), we can approximate 
P{T, < t}by P{Z < ¢}. 
Actually, we do not need X’s to be normally distributed (see Problem 6.6.5). 
Often, we need to approximate the distribution of g(Y,,) given that Y, is AN(u, o*). 


Theorem 1. Suppose that Y, is AN(u, o?), with o,, > O and yz a fixed real 
number. Let g be a real-valued function that is differentiable at x = jz, with g’(u) 4 
0. Then 


@) a(Yn) is AN (g(2), [e’Wwlor). 
Proof. We first show that 


atin) — eth) Yn-w P 
= —> 


3 0. 
@) &' (LL) On On 
Set 
gix)—gt) —, 
nee ei — g'(H), x#u 
0, x=p 


Then A is continuous at x = y. Since 


Ll age 


Yn — Lk = Op 


on 


by Problem 6.2.7, Y, — u as 0, and it follows from Theorem 6.2.4 that h(Y,) Bie 
h(t) = 0. By Slutsky’s theorem, therefore, 


0. 


h(¥,) 2 


On 
That is, 


an) - ah) = Yn-m P 
_ > 0. 
Ong’ (KL) On 


It follows again by Slutsky’s theorem that [g(Y,) — g()]/Lg’()on] has the same 
limit law as (Y, — )/on. 


Example 2. We know by the CLT theorem that ¥, = X is AN(2, 2/n). Suppose 
that g(X) = X(1 — X), where X is the sample mean in random sampling from 2 
population with mean jz and variance o7. Since g/(u) = 1—2u # Oforpu # } 3 


it follows that for u # 4,07 < 00, X(1 — X) is AN(w(1 — 4), (1 — 2)?0?/n). 
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Thus 
X0-X)-w—-#) _ y- wd —p) 
[1—2pulo/J/n)  ~ |1 —2lo//n 


wo (2-89) 
|] — 2plo/J/n 


PRI-Bs1=P | 


for large n. 


Remark I. Suppose that g in Theorem | is differentiable k times, k > 1, at 
x = pand g(yz) = 0 for] <i < k—1,g(u) & O. Then a similar argument 
using Taylor’s theorem shows that 


1 
(4) [g(Ya) — g(w)] if |e wor uk 


where Z is a V(0, 1) RV. Thus in Example 2, when px = 5, g'(4) = 0 and 8"(5) = 
—2 £0. It follows that 


= = L 
n{X(1 — X) — 4] —> —07x7(1) 
since 724 2A). 


Remark 2. Theorem 1 can be extended to the multivariate case, but we will not 
pursue the development. We refer the reader to Ferguson [26] or Serfling [100]. 


Remark 3. In general, the asymptotic variance [g’ (u) Po? of g(Y,) will depend 
on the parameter jz. In problems of inference it will often be desirable to use trans- 
formation g such that the approximate variance var g(Y,,) is free of the parameter. 
Such transformations are called variance stabilizing transformations. Let us write 
a = o7()/n. Then finding a g such that var g(Y,) is free of jz is equivalent to 
finding a g such that 


, _ c 
gs(uy= aap 


for al! 4, where c is a constant independent of jz. It follows that 


dx 
(5) a(x) = cf ara 


a(x) 
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Example 3. In Example 2, o2(n) = wl — p). Suppose that X1,... , Xp are iid 
b(1, p). Then o2(p) = p(1 — p) and (5) reduces to 


ey =e f GaP = 2aresin Ve. 
x7 — x 


Since g(0) = 0, g(1) = 1,c = 2/m and g(x) = (2/7) arcsin /x. 


Remark 4. In Section 7.3 we computed exact moments of some statistics in 
terms of population parameters. Approximations for moments of g(X) can also be 
obtained from series expansions of g. Suppose that g is twice differentiable at x = pw. 
Then 


(6) Eg(X) © g(u) + E(X — we’(u) + 49” (u)E(X — wy? 
and 
(7) Elg(X) — gw © [e’(w)PE(X — 4)”, 


by dropping remainder terms. The case of most interest is to approximate Eg (X) and 
var g(X). In this case, under suitable conditions, one can show that 


2 
(8) Eg(X) © g(u) + KO) 
n 
and 
— a2 
(9) var g(X) © = ls'wor 


where EX = yw and var(X) = o?. 


In Example 2, when X;’s are iid b(1, p), g(x) = x(1 — x), g’(x) = 1 — 2x, 
g” (x) = —2, so that 


ak = = 2 
Eg(X) © E[X( — X)1* p(l — p) + =-(-2) 
—] 
= p(l — p)—— 
n 
and 


var g(X) = ? 


PUP) _ apy? 
n 


In this case we can compute Eg(X) and var g(X) exactly. We have 
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oe a — 1— ei | 
Be(H) = BX - EX? = p-|PO—P) +p?) — pay, 


so that (8) is exact. Also, since X ‘i = Xj, using Theorem 7.3.4 we have 
var g(X) = var(X — ¥’) 
= var(X) — 2cov(X, X’) PER = (EX’)? 


ae 7 —1\? 
_ P—p) [« = 2)? + 22 PD] (? ‘) . 
n n 


— | n 


Thus the error in approximation (9) is 


2p?(1 — p)* 
error = aL i LL 1). 
nt 


Remark 5. Approximations s (6) through (9) do not assert the existence of Eg(X) 
or Eg(X), or var g(X) or var g(X). 


Remark 6. It is possible to extend (6) through (9) to two (or more) variables by 
using Taylor series expansion in two (or more) variables. 


Finally, we state the following result, which gives the asymptotic distribution of 
the rth order statistic, 1 <r <n, in sampling from a population with an absolutely 
continuous DF F with PDF f. For a proof, see Problem 4. 


Theorem 2. If X,-) denotes the rth-order statistic of a sample X1, X2,..., Xn 
from an absolutely continuous DF F with PDF f/f, then 


‘ 1/2 : 
(10) || fp X ir) — 3p} —> Z asn — 00, 


so that r/n remains fixed, r/n = p, where Z is N(0, 1), and 3, is the unique solution 
of F (3p) = p (that is, 3, is the population quantile of order p assumed unique). 


Remark 7. The sample quantile of order p, Zp, is 


1 ed 
Nip), 
(» UfGpk on 


where 3, is the corresponding population quantile and f is the PDF of the population 


distribution function. It also follows that Z, ee 3p: 
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PROBLEMS 7.5 


1. 


In sampling from a distribution with mean jz and variance ao, find the asymp- 
totic distribution of (a) X°, (b) 1/X, (c) In {X/2, and (d) exp(X), both when 
py # Oand when pw = 0. 


Let X ~ P(A). Then (X — a)/VA—“+ N(O, 1). Find a transformation g such 
that (g(X) — g(A)) has an asymptotic ’(0, c) distribution for large jz, where c 
is a suitable constant. 


. Let X;, X2,... , Xn be a sample from an absolutely continuous DF F with PDF 


f. Show that 


mM -l “a 
Po ae 


and 
r(n~-—rt+1) 1 
(n + 1)2(n +2) {fF -'(r/n + DIP 


[Hint: Let Y be an RV with mean jz, and @ be a Borel function such that E¢(Y) 
exists. Expand $(Y) about the point yz by a Taylor series expansion, and use the 
fact that F(X ry) = Uc.) 


Prove Theorem 7. [Hint: For any real ~ and a (> 0), compute the PDF of 
(Ug) —#)/o and show that the standardized U(,), (U(-)— 4) /a, is asymptotically 
N (0, 1) under the conditions of the theorem.] 


var(X(r)) © 


. Let X ~ x2(n). Then (X — n)/V2n is AN(O, 1) and X/n is AN(1, 2/n). Find 


a transformation g such that the distribution of g(X) — g() is AN(0, c). 


6. Suppose that X is G(1, 0). Find g such that g(X) — g(8) is AN(O, c). 
7. Let X;, X2,..., Xn be iid RVs with E|X,|* < oo. Let var(X) = 0? and py = 


7.6 


pa/o*. 
(a) Using the CLT for iid RVs, show that /n(S? — 02) “> N(0, 4 — 04). 


(b) Find a transformation g such that g(S*) has an asymptotic distribution that 
depends on £2 alone, not on o°. 


DISTRIBUTION OF (X, S2) IN SAMPLING FROM 
A NORMAL POPULATION 


Let X1, X2,... , Xn be a sample from N(u, 07), and write X = n~! )“?_, X; and 
S? = (n~1)7! yd (Xi — xy In this section we show that X and S? are inde- 
pendent and derive the distribution of S?. More precisely, we prove the following 
important result. 
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Theorem 1. Let X1, X2,...,Xn be iid N(u,o07) RVs. Then X and (X; — 


X,X2—-—X,...,Xn—- X) are independent. 
Proof. We compute the MGF of X and X;—X, X2—X,... , Xn —X as follows: 


M(t, ty, t2,....tn) = Eexp(tX +1(X1 — X) + 2(X2 — X) +--+ + ty(Xn — X)} 


i=l 


= thttt---4+%-t 
= Eexp bs Xi (« ~ utat—tan)] 


i=] 


=E {Te Feral (were? =n!) 
i=] 


i=] 
=z [] ee |=" + nti = 
i=1 bd 
= FJeno| eRe =D, Slt +n —9) i 


n 2 =n 
= exp {Hi +n dG —H]+ <5 wr + nt; -or| 
i=l 


2 
= exp(ut) exp E (m2 + n? iti _ n)] 


is] 


a 5 o2 
= exp [ ut+ he exp 3c =e 
ie i=} 


= My(t)My, _¥ x, x (th fa) +. fn) 
= M(t,0,0,... ,O)M(O, t, t2,... , tn). 


Corollary 1. X and S? are independent. 
Corollary 2. (n — 1)S*/o? is x2(n — 1). 


Since 
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and X and S? are independent, it follows from 


idee 2 ee ee 2 
DiC —~wy _ milena a eh 
ao? o o2 


te Cee 2 
2 feo S| =e exp m(7a#) +(n— Dost 


= 2 
X- Se 

= Eexp “( *) t reo 05, 
o oO 


1 


52 
(1 — 21)7"/? = (1 — 28)? E exp c - 05 ,  t<s, 
a 2 


that is, 


and we see that 
—y 4] =a —2n--r i 
Eexp | (n — 1) 5! = (1 —2r) ; t<-. 
a 2 
By the uniqueness of the MGF it follows that (n — 1)S?/o? is x?(n — 1). 
Corollary 3. The distribution of /n(X — )/S is t(n — 1). 


Proof. Since /n(X — )/o is N(O, 1) and (n — 1)S?/o? ~ x2 — 1), and 
since X and S$? are independent, 
Vn (X ~ p)/o _ aX) 
Jin —1)S2/07}/(n — 1) S 


is t(n — 1). 


Corollary 4. If X;, X2,..., Xm are iid N(11, a?) RVs, ¥1, Yo,..., Y, are iid 
N (2, 03) RVs, and the two samples are taken independently, (S?/a?)/(S3/o3) is 
F(m — 1,n — 1). If, in particular, 0, = 02, then S?/S3 is F(m — 1,n — 1). 


Corollary 5. Let X;, X2,..., Xm and Y1, Y2,..., Yn, respectively, be indepen- 
dent samples from NV (141, a7) and N (22, 03). Then 


X —Y — (uy — 42) m+n—2 


(i= DS IHIG— DS oalimton ay: 
([om — 87/071 + ( — NS/oz NPY ot/mtozn tn” 
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In particular, if 0, = 02, then 
X-Y-(m- / ~2 
(1 — #2) mn(m +n Y aratenticn a By, 
[(m — 1)S? + (n — 1) $3] en 


Corollary 5 follows since 


and 


— 1)8? — 1)S?2 
Et gh a eine ie) 


2 
oO; O4 


and the two statistics are independent. 
Remark 1. The converse of Corollary 1 also holds (see Theorem 5.3.28). 


Remark 2. In sampling from a symmetric distribution, X and S are uncorre- 
lated (see Problem 4.5.14). 


Remark 3. Alternatively, Corollary 1 could have been derived from Corollary 2 
to Theorem 5.4.6 by using the Helmert orthogonal matrix: 


1/J/n 1/J/n 1/Jn ae 1/J/n 

~1//2 1/2 ) oe 0 

~1/V6 —1/¥6 2/V6 oo 0 
A= ‘ 3 , chy 


0 


: . . nee 0 
—if/J/n(n—-1) -Ifvn(n—1) -1/J/n(n—-1) ---) (n-D/Vn(n — 1) 


For the case of n = 3 this was done in Example 4.4.6. In Problem 7 the reader is 
asked to work out the details in the general case. 


Remark 4. An analytic approach to the development of the distribution of X and 
S? is as follows. Assuming without loss of generality that X; is N(0, 1), we have as 
the joint PDF of (X1, X2,... , Xn) 


1 (fev, 
£41, X2,--- Xn) = Gay (-3 4) 


ep tl tye (n — 1)s? + nx? 
= (nye xp 2 : 
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Changing the variables to y1, y2,..- , Yn by using the transformation yz, = (x, — 
x)/s, we see that 


0 and Siyz aan. 
k=1 k=l 


It follows that two of the y;’s, say yn—1 and y, are functions of the remaining y,. 
Thus either 


a+ Bp a—Bp 
Yn-1 = d y= 5? 
or 
a—Bp a+ B 
Yn-1 = and yn = ’ 
2 
where 


n-2 n—2 n—-2 ss 
a=— >> yx and B= 20-2598 -( n) 
1 


k=1 k=1 ke 


We leave the reader to derive the joint PDF of (¥1, Y2,..., Yn—2; X, S$), using 
the result described in Remark 4.4.2 and to show that the RVs X, S? and (Yj, Yo, 
... , Yn—2) are independent. 


PROBLEMS 7.6 


1. Let X,, X2,... , X, be arandom sample from N(1, 0”) and X and S?, respec- 
tively, be the sample mean and the sample variance. Let Xn41 ~ V(u, o), and 
assume that X;, X2,..., Xn, Xn+1 are independent. Find the sampling distri- 
bution of [(Xn41 — X)/S]/n/(@ + 1). 


2. Let X14, X2,..., Xm and Yj, Y2,... , Y, be independent random samples from 
N (1,07) and N (12, 07), respectively. Also, let w, B be two fixed real num- 
bers. If X, ¥ denote the corresponding sample means, what is the sampling dis- 
tribution of 


a(X — 44) + BY — pw) 


(m — 1)S? + (n — 1)S3 eB 
m+n—2 m n° 


where S? and a respectively, denote the sample variances of the X’s and the 
Y’s? 
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Let X1, X2,..., X, be a random sample from NV (yu, o*) and k be a positive 


integer. Find E(S?*). In particular, find E(S?) and var(S?). 


. A random sample of 5 is taken from a normal population with mean 2.5 and 


variance o? = 36. 
(a) Find the probability that the sample variance lies between 30 and 44. 


(b) Find the probability that the sample mean lies between 1:3 and 3.5, while 
the sample variance lies between 30 and 44. 


. The mean life of a sample of 10 light bulbs was observed to be 1327 hours with 


a standard deviation of 425 hours. A second sample of 6 bulbs chosen from a 
different batch showed a mean life of 1215 hours with a standard deviation of 
375 hours. If the means of the two batches are assumed to be same, how probable 
is the observed difference between the two sample means? 


Let St and Ss be the sample variances from two independent samples of sizes 
ny = 5 and n2 = 4 from two populations having the same unknown variance 
o*. Find (approximately) the probability that St/Ss < 1/5.2 or > 6.25. 


Let X;, X2,... , X, be a sample from NV (1, 67). By using the Helmert orthog- 
onal transformation defined in Remark 3, show that X and S? are independent. 


Derive the joint PDF of X and S* by using the transformation described in Re- 
mark 4. 
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Let (X1, ¥1), (X2, Y2),.-. , (Xn, Yn) be a sample from a bivariate normal population 
with parameters 111, 42, P, of, 03. Let us write 


and 


Ray Xs, Yan!3o%, 
i=] 


SS=n—Y' POG -X, SHH -— DY -PY, 
i=] 


i=] 


Su=— 1! 0% - X)% -¥). 


i=] 


In this section we show that (X, Y) is independent of (S?, St, $3) and obtain the 
distribution of the sample correlation coefficient and regression coefficients (at least 
in the special case where p = 0). 


SAMPLING FROM A BIVARIATE NORMAL DISTRIBUTION 345 


Theorem 1. The gaa vectors (X, Y) and (X; — X, Xo — X. Xn — X, 


Y; —Y,¥%-—-Y,... — Y) are InCepenseny, The z Jom peas -6f (X, Y) is 
bivariate normal with pees LA, 12, P, Fj 2/n, ory 2 In. 


Proof. _The * prowl follows eae” the ue of the ves of Theorem 7.6.1. The 
MGE of (X, Y =p eee —X,¥,—- — Y) is given by 


M* = Miu, v,t,12,.-.. »fn, S1,82,-+- 5n) 


i=l i=l 


= Bexp| x (F tern? tana], 


= Eexp Es +v0¥ + 5 (Xi - X)+ Ssh = | 


where f = n~! S“7_,7;,5 =n7! 7%, 5;. Therefore, 
n 
m= |] Een((5 +4 -i) x; + (= +5:—3) ¥| 
n 
=lleeiGe —#) m1 +(F+5-5) m2 


4 ole /n) +t; — tf]? + 2poyo2[(u/n) +t; — A(v/n) + 5; — 3) 
2 


+ of[(v/n) + 5; — 5? | 
5} 


Uu ay + 2poajo2uv + v2 oF 
2n 


= exp (hw + pw2u+ 


2 
“exp 7 dt —f) + poor aC — (si — 5) +5 73 Di 3" 


i=1i i=l 


= Miu, v)M2(t, 2, Sey tn, $1, $2, eee > 5n) 


for all real u, v, t1,f2,... atns $1, 825--- 5 Sns where M, is the MGF of (X, Y) and 
M2 is the MGF of (X; — X,...,Xn —X,¥% —Y,...,¥n — Y). Also, My is the 
MGEF of a bivariate normal distribution. This completes the proof. 


Corollary. The sample mean vector (X, Y) is independent of the sample variance— 


2 
: _{s 
covariance matrix ( J 


Su). F nee ; 
a 4) in sampling from a bivariate normal population. 
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Remark 1. The result of Theorem 1 can be generalized to the case of sampling 
from a k-variate norma! population. We do not propose to do so here. 


Remark 2. Unfortunately, the method of proof of Theorem 1 does not lead to the 
distribution of the variance~covariance matrix. The distribution of (X,Y, 5: ae Si, S3) 
was found by Fisher [27] and Romanovsky [90]. The general case is due to 
Wishart [118], who determined the distribution of the sample variance-covariance 
matrix in sampling from a k-dimensional normal distribution. The distribution is 
named after him. 


We will next compute the distribution of the sample correlation coefficient: 


ray(Xi — X)(¥i — ¥) _ Su 


() R= — Ati OA) 8S 
[ oki am xy 1% — y)2]7 51 S2 


It is convenient to introduce the sample regression coefficient of Y on X 


(2) Bry = PE OG =n? 


Since we will need only the distribution of R and By; whenever p = 0, we make 
this simplifying assumption in what follows. The general case is computationally 
quite complicated. We refer the reader to Cramér [16] for details. 

We note that 


3) — CeO — X) 
~  (n—1)S\S2 
and 
rat Yi(Xi — X) 
4 x= =, 
(4) Y|X G4) ry 
Moreover, 
B2,, S? 
2 Y|X“1 
2 


In the following we write B = By x. 


Theorem 2. Let (X1, Yi),.-., (Xn, Yn), n > 2, be a sample from a bivariate 
pi population with parameters EX = yy, EY = pz, var(X) = a}, var(Y) = 
ar, and cov(X, Y) = 0. In other words, let X1, X2,... , Xn be iid N(y41, a?) RVs, 
and Y1, Y2,....¥n be iid N(y2, 0: 2) RVs, and suppose that the X’s and Y’s are 
independent. Then the PDF of R is given by 


SAMPLING FROM A BIVARIATE NORMAL DISTRIBUTION 347 


oO IeY as — 2ya-9/2, -l<r<l, 
(6) Air) = 4 FEV — 2)/2] 
0, otherwise; 
and the PDF of B is given by 
I'(n/2) aon | 
(7) hy(b) = —00 <b <o. 


PAP U(m — 1)/2] (of + of b?y"/?’ 


Proof. Without any loss of generality, we assume that zy = 2 = O and a} — 


oF = |, for we can always define 


OF 02 


Now note that the conditional distribution of Y;, given X1, X2,..., Xn, is N(, 1), 
and Yj, Yo,... , Yn, given X1, X2,..., Xn, are mutually independent. Let us define 
the following orthogonal transformation: 


(9) i= Gap Td 


where ((ci;))i, j=1,2,... ,n 18 an orthogonal matrix with the first two rows 


(10) aa j=1,2,...,n, 
and 
xj—-X 

(11) = Sr Ga j=1,2,...,n. 
It follows from orthogonality that for any i > 2, 

n n 1 n 
(12) aD Uy ae aa 
and 

n n 
(13) 2. up = e. jae 2A ry! 

i= i=] \j=1 j 


n n 


S (Sa Vij = ae : 


348 
Moreover, 
(14) 
and 


(15) 
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uy = J/ny 


u2 = by Ye — x)*, 


where b is a value assumed by RV B. Also, U;, U2, ... , Un, given Xi, X2,..., Xn, 
are normal RVs (being linear combinations of the Y’s). Thus 


n 


(16) E(U; | X1, Xo, 0... Xa} = Do cis EY; | Xa, X2,--- Xa) 
j=l 
=0 

and 


n n 
cov{U;, Ux | X1, X2,--., Xn} = cov [Seu > cep¥p | X1, X2,-.. %] 


j=t p=1 


n n 
= > > cijcep covt¥), ¥p | Xi, X2,-.. Xn} 
j=l p=1 


n 
= ) CijCkj- 


j=l 


This last equality follows since 


cov{Y;, Yp | X1, X2,..., Xa} = | 
From orthogonality, we have 


(17) cov{U;, Ug | X1, X2,.-., Xn} = | 


0, SFP, 
1, j= p. 


0, ixk, 
1, i=k; 


and it follows that the RVs U;, U2,...,Un, given X1, X2,..., Xn, are mutually 


independent (0, 1). Now 


(18) 


n n 
30; -y) = Voy ~ ar 
j=i1 i=! 
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n 
= 2 2 
= ys Bay 
j= 
n 
eo 2 
= > Wy: 
j=2 


Thus 
U2 UF 
(19) RS 

Di2 UF Uy + Vij U; 
Writing U = U3 and W = )-y_3 UP, we see that the conditional distribution of U, 
given X1, X2,..., Xn, 18 x72), and that of W, given X1, X2,..., Xn, 18 x2(n—2). 
Moreover, U and W are independent. Since these conditional distributions do not 


involve the X’s, we see that U and W are unconditionally independent with x?) 
and x7(n — 2) distributions, respectively. The joint PDF of U and W is 


1 
1/2-1 ,—u/2 (n—2)/2-1 ,—w/2 
u é Fi@ —d/220-378 yaa w e : 


LO TOs? 


Let u+w =z; thenu = r7z and w = z(1—r7). The Jacobian of this transformation 
is z, so that the joint PDF of R? and Z is given by 


1 
*(r2 2) = n/2-3/24~z/2(,2)-1/2(4 _. p2yn/2-2, 
cae age 3 ro rin —d/226-D2 le liaclas | ) 


The marginal PDF of R? is easily computed as 


PL@ — 1/2] = - 
20 #2 a 2 1/2 1 _ 2 n/2 2 @) 2 1. 
ON: MO aoa ee ay? 


Finally, using Theorem 2.5.4, we get the PDF of R as 


Mn — 19/2] 


(J lt? er 1. 
r(5) PL — 2)/2] 


fi) = 


As for the distribution of B, note that the conditional PDF of U2 = J/n—1 BS), 
given X1, X2,... , Xn, is N(O, 1), so that the conditional PDF of B, given X1, X2, 
...,Xn, is N(0, 1/ (xi — ¥)”). Let us write A = (n — 1)S?. Then the PDF of RV 
A is that of a x2(n — 1) RV. Thus the joint PDF of B and A is given by 


(21) h(b, 4) = g(b | A)h2(A), 


where g(b | A) is (0, 1/A), and h2(A) is x*(n — 1). We have 
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(22) hy(b) = [ v0. A) dd 
0 


= } ia nll g-A/2N4B?) yy 
nr (SPU — 1)/2] Jo 
T(n/2) 1 


=Fbrin- baarbeye 0 <8 < 
P(4)P[(n — 1)/2] 0 + b?)"/? 0 <b <oco 


To complete the proof let us write 
Xj = 1+ Xfo, and Y; = p2 + ¥7oo, 


where X¥ ~ N(O, 1) and ¥* ~ N(O, 1). Then Xj ~ N(441, 07), Yi ~ N(u2, 07), 
and 


7 (Xi — XM —Y) 
7 (Xi — X)* 7%; — Y)? 


(23) R= 


so that the PDF of R is the same as derived above. Also, 


n *_ ¥*\y* _ ¥* 
(24) pS din Ki See! 
Of Din (X7F — XP? 


_ pr 

ral 
where the PDF of B* is given by (22). Relations (23) and (24) are used to find the 
PDF of B. We leave the reader to carry out these simple details. 


Remark 3. In view of (23), namely the invariance of R under translation and 
(positive) scale changes, we note that for fixed m the sampling distribution of R, 
under p = 0, does not depend on jj, 422,01, and o2. In the general case when 
p # 0, one can show that for fixed n the distribution of R depends only on p but not 
ON {44, 42, G1, and a2 (see, for example, Cramér [16, p. 398]). 


Remark 4. Let us change the variable to 
(25) T = ———— Vn - 2. 


Then 
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and the PDE of T is given by 


1 1 if 


(26) pt)= Va 2 Bin D/L +A /n + 12/(n — DOV?” 


which is the PDF of a t-statistic with n — 2 d.f. Thus T defined by (25) has a t(n — 2) 
distribution, provided that p = 0. This result facilitates the computation of probabil- 
ities under the PDF of R when p = 0. 


Remark 5. To compute the PDF of By;y = R(5S1/S2), the sample regression 
coefficient of X on Y, all we need to do is to interchange oj and 02 in (7). 


Remark 6. From (7) we can compute the mean and variance of B. For n > 2, 
clearly, 


EB=0, 
and for n > 3, we can show that 


po 
for 1 
EB? = var(B) = ———. 
af n—3 
Similarly, we can use (6) to compute the mean and variance of R. We have, forn > 4, 
under p = 0, 


ER=0 
and 
2 | 
ER* = var(R) = ——. 
n—3 
PROBLEMS 7.7 
1. Let (X1, Yi), (X2, Y2),-.. , (Xn, Yn) be a random sample from a bivariate nor- 


mal population with EX. = Mi, EY = po, var(X) = var(Y) = a”, and 
cov(X, ¥) = pa*. Let X, ¥ denote the corresponding sample means, S?, 53, 
the corresponding sample variances, and S;;, the sample covariance. Write R = 
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2811 /(S? + S3). Show that the PDF of R is given by 


T(n/2) 
JxrV[(n — 1)/2] 


|r| <1. 


fM= 1 = p72) PP — pry OD — 1?) 9?, 


[Hint: Let U = (X + Y)/2, and V = (X — Y)/2, and observe that the ran- 
dom vector (U, V) is also bivariate normal. In fact, U and V are independent. ] 
(Rastogi [87]) 


2. Let X and Y be independent normal RVs. A sample of n = 11 observations on 
(X, Y) produces sample correlation coefficient r = 0.40. Find the probability of 
obtaining a value of R that exceeds the observed value. 


3. Let X;, Xz be jointly normally distributed with zero means, unit variances, and 
correlation coefficient p. Let S be a x2(n) RV that is independent of (X1, X2). 
Then the joint distribution of ¥; = X1/./S/n and ¥2 = X2/./S/n is known asa 
central bivariate t-distribution. Find the joint PDF of (¥1, Y2) and the marginal 
PDFs of Y; and Y2, respectively. 


4. Let (X1, ¥,),..., (Xn, ¥,) be a sample from a bivariate normal distribution 
with parameters EX; = m4, EY; = pmo, var(Xi) = var(¥;) = o7, and 
cov(X;, ¥;) = po”, i = 1,2,... ,n. Find the distribution of the statistic 


(X — 1) — Y — pa) 


ee 0G —Y;~ ¥+7)2 


T(X,Y)=J/n 


CHAPTER 8 


Parametric Point Estimation 


8.1 INTRODUCTION 


In this chapter we study the theory of point estimation. Suppose, for example, that a 
random variable X is known to have a normal distribution A (2, 02),"but we do not 
know one of the parameters, say yz. Suppose further that a sample X1, X2,... , Xn is 
taken on X. The problem of point estimation is to pick a (one-dimensional) statistic 
T(X,, X2,..., Xn) that best estimates the parameter 4. The numerical value of T 
when the realization is xj, x2, ... , X, is frequently called an estimate of jz, while the 
statistic T is called an estimator of wu. If both uz and o” are unknown, we seek a joint 
statistic T = (U, V) as an estimator of (u, 0”). 

In Section 8.2 we formally describe the problem of parametric point estimation. 
Since the class of all estimators in most problems is too large, it is not possible to find 
the “best” estimator in this class. One narrows the search somewhat by requiring that 
the estimators have some specified desirable properties. We describe some of these 
and also outline some criteria for comparing estimators. 

Section 8.3 deals, in detail, with some important properties of statistics, such as 
sufficiency, completeness, and ancillarity. We use these properties in later sections to 
facilitate our search for optimal estimators. Sufficiency, completeness, and ancillarity 
also have applications in other branches of statistical inference, such as testing of 
hypotheses and nonparametric theory. 

In Section 8.4 we investigate the criterion of unbiased estimation and study meth- 
ods for obtaining optimal estimators in the class of unbiased estimators. In Section 
8.5 we derive two lower bounds for variance of an unbiased estimator. These bounds 
can sometimes help in obtaining the “best” unbiased estimator. 

In Section 8.6 we describe one of the oldest methods of estimation, and in Section 
8.7 we study the method of maximum likelihood estimation and its large-sample 
properties. Section 8.8 is devoted to Bayes and minimax estimation, and Section 8.9 
deals with equivariant estimation. 
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8.2. PROBLEM OF POINT ESTIMATION 


Let X be an RV defined on a probability space (2, S, P). Suppose that the DF F of X 
depends on a certain number of parameters, and suppose further that the functional 
form of F is known except perhaps for a finite number of these parameters. Let 
@ = (6;, 62, ..., O) be the unknown parameter associated with F. 


Definition 1. The set of all admissible values of the parameters of a DF F is 
called the parameter space. 


Let X = (X1, X2,..., X,) be an RV with DF Fo, where @ = (01, 02, ... , %) is 
a vector of unknown parameters, @ € ©. Let y be a real-valued function on ©. In 
this chapter we investigate the problem of approximating y(@) on the basis of the 
observed value x of X. 


Definition 2. Let X = (X1, X2,.-., Xn) ~ Po, 8 € O. A statistic 5(X) is said 
to be a (point) estimator of w if 5 : ¥ —> © where & is the space of values of X. 


The problem of point estimation is to find an estimator 6 for the unknown para- 
metric function (8) that has some nice properties. The value 5(x) of 5(X) for the 
data x is called the estimate of ¥(@). 

In most problems X1, X2,... , Xn are iid RVs with common DF Fg. 


Example 1. Let X;, X2,..., Xn be iid G(1, 6), where © = {6 > 0} and 6 is to 
be estimated. Then X = R, and any map 5 : X — (0, oo) is an estimator of 9. Some 
typical estimators of 6 are X = no} yet Xj, and (2/[n(m + 1)]}} eae i Xj. 


Example 2. Let X1, X2,... , Xn be iid b(1, p) RVs where p € (0, 1]. Then X is 
an estimator of p and so also are 5;(X) = X1, 69(K) = (X1 + X,)/2, and 63(X) = 
iat ajXj, where 0 < ajs 1, Deja aj= 1. 


It is clear that in any given problem of estimation we may have a large, often 
an infinite class of appropriate estimators to choose from. Clearly, we would like 
the estimator 5 to be close to w(@), and since 6 is a statistic, the usual measure of 
closeness |6(X) — ¥(@)| is also an RV, we interpret “‘5 close to y” to mean “close on 
the average.” Examples of such measures of closeness are 


(1) Po{|5(X) — ¥(8)| < 5} 
for some € > 0, and 
(2) E@\5(X) — w(8)I" 


for some r > 0. Obviously, we want (1) to be large but (2) to be small. For r = 2, 
the quantity defined in (2) is called mean square error and we denote it by 


(3) MSE9(5) = Eo{6(X) — w(0)}?. 
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Among all estimators for y, we would like to choose one, say 59, such that 

(4) Po{ldo(X) — w(8)| < €} > Po{|8(X) — ¥(8)| < €} 

for all 5, all e > O, and all @. For (2), the requirement is to choose dg such that 

(5) MSEo (50) < MSE@(S) 


for all 5 and all @ € ©. Estimators satisfying (4) or (5) do not generally exist. 
We note that 


MSE9(5) = Eo[5(X) — E9(X)}* + [E9d(X) — ¥(0)? 


(6) = varg 5(X) + (b(6, W))’, 
where 
(7) (5, w) = E9d(X) — w(8), 


is called the bias of 5. An estimator that has small MSE has small bias and variance. 
To control MSE, we need to control both variance and bias. 
One approach is to restrict attention to estimators which have zero bias, that is, 


(8) E9d(X) = (8) forall@€ 0. 


The condition of unbiasedness (8) ensures that on average, the estimator 6 has no 
systematic error; it neither over- nor underestimates y on average. If we restrict at- 
tention to the class of unbiased estimators, we need to find an estimator do in this 
class such that 3g has the least variance for all @ € @. The theory of unbiased esti- 
mation is developed in Section 8.4. 

Another approach is to replace |5 — y|’ in (2) by a more general function. Let 
L(@, 5) measure the loss in estimating y by 5. Assume that L, the loss function, 
satisfies L(0, 6) > 0 for all @ and 5, and L(@, 7 (@)) = 0 for all @. Measure average 
loss by the risk function 


(9) R(@, 5) = Eg L(@, 5(X)). 


Instead of seeking an estimator that minimizes R, the risk, uniformly in 6, we mini- 
mize 


(10) / R(O, 5) (0) d@ 


for some weight function x on © and minimize 


(11) sup R(6, 8). 
660 
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The estimator that minimizes the average risk defined in (10) leads to the Bayes es- 
timator, and the estimator that minimizes (11) leads to the minimax estimator. Bayes 
and minimax estimation are discussed in Section 8.8. 

Sometimes there are symmetries in the problem which may be used to restrict 
attention to estimators that exhibit the same symmetry. Consider, for example, an 
experiment in which the length of life of a light bulb is measured. Then an estimator 
obtained from the measurements expressed in hours and minutes must agree with 
an estimator obtained from the measurements expressed in minutes. If X represents 
measurements in original units (hours) and Y represents corresponding measure- 
ments in transformed units (minutes), Y = cX (here c = 60). If 5(X) is an estimator 
of the true mean, we would expect 5(Y), the estimator of the true mean, to corre- 
spond to 6(X) according to the relation 6(Y) = c6(X). That is, 5(cX) = c6(X) for 
all c > 0. This is an example of an equivariant estimator, a topic under extensive 
discussion in Section 8.9. 

Finally, we consider some large-sample properties of estimators. As the sample 
size n —» oo, the data x are practically the whole population, and we should expect 
5(X) to approach ¥(@) in some sense. For example, if 6(X) = X, vw) = EX, 
and X;, X2,..., Xn are iid RVs with finite mean, the strong law of large numbers 
tells us that X > Eg X, with probability 1. This property of a sequence of estimators 
is called consistency. 


Definition 3. Let X1, X2,... be a sequence of iid RVs with common DF Fo, 
6 € ©. A sequence of point estimators T,,(X1, X2,..., Xn) = Tn will be called 
consistent for yr (@) if 


Tr > ¥(8) — asn > 00 
for each fixed @ € O. 


Remark 1. Recall that T,, Ein yw (@) if and only if P{|T, — ~(@)| > «} > Oas 
n — oo for every € > 0. One can similarly define strong consistency of a sequence 


of estimators T,, if Tp = (0). Sometimes, one speaks of consistency in the rth 
r . . . 
mean when T, —> (0). In what follows, consistency will mean weak consistency 


of T, for ¥-(@), that is, T, “> ¥°(0). 


It is important to remember that consistency is a large-sample property. Moreover, 
we speak of consistency of a sequence of estimators rather than one point estimator. 


Example 3. Let X, X2,... be iid b(1, p) RVs. Then EX; = p and it follows 
by the WLLN that 


i P 
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Thus X is consistent for p. Also, ()-7 Xi + D/(n + 2) = p, so that a consistent 
estimator need not be unique. Indeed, if 7, is pandc, — Oasn — on, then 


P 
Tn + Cn —> p. 


Theorem 1. If X;, X2... are iid RVs with common law £(X), and E|X|? < co 

for some positive integer p, then 

n yk 

LiM Bopxt foe ck <p, 

n 
and n=! St X# is consistent for EX*, 1 < k < p. Moreover, if cy is any sequence 
of constants such that c, > 0 as n -> 00, then (n~! Y7 Xk + ca) is also consistent 
for EX*,1 <k < p. Also, if c, > 1 asn — oo, then (can! > x‘) is consistent 
for EX*. This is simply a restatement of the WLLN for iid RVs. 


Example 4. Let X;, X2,... be iid N(w, 07) RVs. If S? is the sample variance, 
we know that (n — 1)S?/o? ~ x?(n — 1). Thus E(S2/o) = 1 and var(S?/o?) = 
2/(n — 1). It follows that 


var(S*) 204 


2 2 
P{ls —oa*|>e}< 3 sur ae (eae 


0 asn —> oo. 


Thus S? + o?. Actually, this result holds for any sequence of iid RVs with E|X[? < 
oo and can be obtained from Theorem 1. 


Example 4 is a particular case of the following theorem. 


Theorem 2. If 7, is a sequence of estimators such that ET, —> w(@) and 
var(T,) > 0 as n —> oo, then 7, is consistent for (8). 


Proof. We have 
P{IT, — ¥(0)| > €} < 67 E[Tn ~ ET, + ET, — VOY? 
= 6 *{var(Tn) +{ETn — ¥(8)7} > 0  asn—> 00. 


Other large-sample properties of estimators are asymptotic unbiasedness, asymp- 
totic normality, and asymptotic efficiency. A sequence of estimators {7;,} is asymp- 
totically unbiased for w (0) if 


jim, EoTn(X) = ¥(8) 


for all 6. A consistent sequence of estimators {T,,} is said to be consistent asymp- 
totically normal (CAN) for 4(@) if T, ~ AN(W(@), v(@)/n) for all @ € O. If 
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v(@) = 1/17 (@), where /(@) is the Fisher information (Section 8.7), then {7,} is 
known as a best asymptotically normal (BAN) estimator. 


Example 5. Let Xi, X2,... , Xn be tid N(@, 1) RVs. Then 7, = }77_, Xi/(n + 
1) is asymptotically unbiased for 6 and BAN estimator for 9 with v(@) = 1. 


In Section 8.7 we consider large-sample properties of maximum likelihood esti- 
mators, and in Section 8.5 asymptotic efficiency is introduced. 


PROBLEMS 8.2 


1. Suppose that 7, is a sequence of estimators for parameter @ that satisfies the 


conditions of Theorem 2. Then 7, poe 6, that is, 7, is squared-error consistent 
for 6. If 7, is consistent for 0 and |7,, — @| < A < oo for all 0 and all (x), x2, 


..+ Xn) € Ry, show that T,, ee 0. If, however, |7, — 60] < An < oo, show that 
T, may not be squared-error consistent for 0. 


2. Let X;, X2,..., Xn be a sample from U[0, 0],6 € © = (0, 00). Let Xq) = 
max{X), X2,... , Xn}. Show that Xn) s 6. Write Y, = 2X. Is Y, consistent 


for 0? 
3. Let X), X2,..., Xn be iid RVs with EX; = p and E|X;|* < oo. Show that 
T(X4, X2,...,Xn) = 2In(a + DI"! 2, 1X; is a consistent estimator for j1. 


4, Let X1, X2,... , X, be asample from U(0, 6]. Show that T(X1, X2,..., Xn) = 
(TT; Xi)'/” is a consistent estimator for @e—!. 


5. In Problem 2, show that T(X) = X,,) is asymptotically biased for 6 and is not 
BAN. [Show that n(9 — Xqy) > G(,4).] 


6. In Problem 5, consider the class of estimators T(X) = cX nm), c > 0. Show that 
the estimator Ty(X) = (n + 2)X(n)/(n + 1) in this class has the least MSE. 


7. Let X1, X2,..., Xp be iid with PDF fg(x) = exp{—(@ — 9)}, x > 6. Consider 
the class of estimators T(X) = X(1) +b, b € R. Show that the estimator that 
has the smallest MSE in this class is given by T(X) = Xq) — 1/n. 


8.3 SUFFICIENCY, COMPLETENESS, AND ANCILLARITY 


After the completion of any experiment, the job of a statistician is to interpret the 
data she has collected and to draw some statistically valid conclusions about the 
population under investigation. In adddition to being costly to store, the raw data by 
themselves are not suitable for this purpose. Therefore, the statistician would like to 
condense the data by computing some statistics from them and to base her analysis 
on these statistics, provided that there is “no loss of information” in doing so. In 
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many problems of statistical inference a function of the observations contains as 
much information about the unknown parameter as do all the observed values. The 
following example illustrates this point. 


Example 1. Let X;, X2,..., Xn be a sample from N(, 1), where yz is un- 
known. Suppose that we transform variables X;, X2,... , Xn to 1, Y2,... , Yn with 
the help of an orthogonal transformation so that Yj is V(./n w, 1), Yo,... , Yn are 
iid N(O, 1), and Y,, Y2,..., Y, are independent. (Take y; = ./nx, and fork = 
2. My Ye = (CK ~ L)xy — (ey +--+ + xR-1)])//K(K — 1).) To estimate 2 we can 
use either the observed values of X,, X2,... , X, or simply the observed value of 
Y= Sn X. The RVs ¥2, ¥3,..., Yn provide no information about jz. Clearly, Y; 
is preferable since one need not keep a record of all the observations; it suffices to 
accumulate the observations and compute y;. Any analysis of the data based on y; 
is just as effective as any analysis that could be based on x;’s. We note that Y, takes 
values in ,, whereas (X;, X2,... , X,) takes values in Ry. 


A rigorous definition of the concept involved in the discussion above requires the 
notion of a conditional distribution and is beyond the scope of this book. In view of 
the discussion of conditional probability distributions in Section 4.2, the following 
definition will suffice for our purposes. 


Definition 1. Let X = (X 1, X2,...,X,) be a sample from {Fg: @ € O}. A 
statistic T == T(X) is sufficient for 6 or for the family of distributions {Fg: 6 € O} 
if and only if the conditional distribution of X, given T = t, does not depend on 0 
(except perhaps for a null set A, Po{T € A} = 0 for all 6). 


Remark 1. The outcome X,, X2,..., X, is always sufficient, but we will ex- 
clude this trivial statistic from consideration. According to Definition 1, if T is suffi- 
cient for 6, we need only concentrate on T since it exhausts all the information that 
the sample has about @. In practice, there will be several sufficient statistics for a 
family of distributions, and the question arises as to which of these should be used in 
a given problem. We will return to this topic in more detail later in this section. 


Example 2. We show that the statistic Y; in Example 1 is sufficient for u. By 
construction ¥2,... , Y, are iid N’(0, 1) RVs that are independent of ¥;. Hence the 
conditional distribution of Y2,... , ¥n, given Yj = Jn X, is the same as the un- 
conditional distribution of (Y2,..., Y,), which is multivariate normal with mean 
(0, 0, ... , 0) and dispersion matrix I,,_. Since this distribution is independent of x2, 
the conditional distribution of (Y1, Y2,... , Y,), and hence (X;, X2,... , Xn), given 
Y, = y1, is also independent of yz and Yj is sufficient. 


Example 3. Let X;, X2,... , Xn be iid b(1, p) RVs. Intuitively, if a loaded coin 
is tossed with probability p of heads n times, it seems unnecessary to know which 
toss resulted in a head. To estimate p, it should be sufficient to know the number of 
heads in n trials. We show that this is consistent with our definition. Let T(X1, X2, 
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Xn) = Dyey Xi. Then 


Pima Kem 


f P{X, =4X],..., = = 
Sox=1{ = {X, 7 Xn = Xn, T fy 
fl (T)ora — py" 


if 7} x; = t, and = 0 otherwise. Thus, for )-j x; = t, we have 


Lixi(y — pt Lx 
Pits =a ee a 


n\ _ nyi-t n\’ 
(Olax Pp) (") 


which is independent of p. It is therefore sufficient to concentrate on $77 Xj. 
Example 4, Let X,, X2 be iid P(A) RVs. Then X; + X2 is sufficient for A, for 


P(X, = x1, X2 = x2 | X1 + X2 = 2} 
P{X, =x, X2=t—x)} 

i P{X, + X2 =1} 
0 otherwise. 


ift =x, +2x2,x; =0,1,2,..., 


Thus, for x; = 0,1,2,...,i = 1,2, x) +x2 =f, we have 
t 1\' 

P(X, = x1, X2 = x2 |X, + X2=t}h = P at 

1 


which is independent of i. 
Not every statistic is sufficient. 


Example 5. Let X;, X2 be iid P(A) RVs, and consider the statistic T = X1+2X2. 

We have 

P{X,; =0, X2= 1} 

P{X, + 2X2 = 2} 
2 e* (Ae) 
~ P{X, =0,X2 = 1} + P{X) =2, X2 = 0} 
- re = 1 
~ NeW? 4 (A2/2)e-2* ~~ 1-4 (A/2)’ 


P{X, =0, X2. =1| X1 +2X2 = 2} 


and we see that X; + 2X2 is not sufficient for A. 


Definition | is not a constructive definition since it requires that we first guess a 
statistic 7 and then check to see whether 7 is sufficient. Moreover, the procedure for 
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checking that T is sufficient is quite time consuming. We now give a criterion for 
determining sufficient statistics. 


Theorem 1 (Factorization Criterion). Let X,, X2,...,X, be discrete RVs 
with PMF po(x1, x2,... ,Xn),6 € ©. Then T(Xj, X2,... , Xn) is sufficient for 6 if 
and only if we can write 


(1) Po(x1, X2,-.. ,Xn) = A(x, x2,... , Xn) B9(T (11, X2,-.-  Xn))s 


where h is a nonnegative function of x1, x2, ... ,X, only and does not depend on 4, 
and gg is a nonnegative nonconstant function of 0 and T (x1, x2,... , Xn) only. The 
statistic T(X,,..., X,) and parameter 6 may be multidimensional. 


Proof. Let T be sufficient for 6. Then P{X = x | T = t} is independent of 6, 
and we may write 


Po{X = x} = Po{X = x, T(X1, X2,.-., Xn) = 1} 
= Po{T =t} P(X =x|T = 1%}, 
provided that P{X = x | T = f} is well defined. 


For values of x for which Po{X = x} = 0 for all 6, let us define h(x1, x2, 
. »Xn) = 0, and for x for which Py {X = x} > 0 for some 9, we define 


h(x, x2,... sXn) = P{X; =4%1,... Xn =%y | T = 1} 
and define 
8o(T (x4, X2,-.- Xn) = Po{T(x1,... , Xn) = t}. 


Thus we see that (1) holds. 
Conversely, suppose that (1) holds. Then for fixed t9 we have 


> Po(K =x} 


x: T(x)=t 


>> so(T@®)hA) 


x: T(x)=Io 


= gol) >) A(x). 
T(x)=% 


Po{T = to} 


\ 


Suppose that Pe{T = to} > 0 for some 6 > 0. Then 


Po{X =x, TX) =} |} if T(x) # to, 
= |] Po(X=x} 
Po{T (x) = to} Po{T &) =} if T (x) = to. 


Po{X =x|T = to} = 
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Thus, if T (x) = fo, then 
Po{X = x} 86 (to)h(x) 


Po{T(x) = 10} 80(t0) twain h®) 


which is free of 9, as asserted. This completes the proof. 


Remark 2. Theorem 1 also holds for the continuous case and, indeed, for quite 
arbitrary families of distributions. The general proof ts beyond the scope of this book, 
and we refer the reader to Halmos and Savage [38] or to Lehmann [63, pp. 53-56]. 
We will assume that the result holds for the absolutely continuous case. We leave 
the reader to write the analog of (1) and to prove it, at least under the regularity 
conditions assumed in Theorem 4.4.2. 


Remark 3. Theorem 1 (and its analog for the continuous case) holds if 6 is a 
vector of parameters and T is a multiple RV, and we say that T is jointly sufficient 
for 9. We emphasize that even if # is scalar, 7 may be multidimensional (Example 9). 
If 9 and T are of the same dimension, and if 7 is sufficient for 6, it does not follow 
that the jth component of T is sufficient for the jth component of 6 (Example 8). 
The converse is true under mild conditions (see Fraser [29, p. 21]). 


Remark 4. If T is sufficient for 6, any one-to-one function of T is also sufficient. 
This follows from Theorem | since if U = k(T) is a one-to-one function of T, then 
t = k~!(w), and we can write 


fo(X) = go(t)h(x) = go(k7'(u))A(K) = gj (u)h(x). 
If 7;, T2 are two distinct sufficient statistics, then 
fo(X) = go(ti)hi(&) = go(t2)h2(x), 


and it follows that 7; is a function of 72. It does not follow, however, that every 
function of a sufficient statistic is itself sufficient. For example, in sampling from 


: ey : ar >. 
a normal population, X is sufficient for the mean yz but X~ is not. Note that X is 
sufficient for p?. 


Remark 5. As a rule, Theorem | cannot be used to show that a given statistic 
T is not sufficient. To do this, one would normally have to use the definition of 
sufficiency. In most cases Theorem | will lead to a sufficient statistic if it exists. 


Remark 6. Sf T(X) is sufficient for {Fg: 6 € ©}, then T is sufficient for 
{Fo: 6 € w}, where w C ©. This follows trivially from the definition. 


Example 6. Let X1, X2,..., Xn be iid b(1, p) RVs. Then T = pa X; is suf- 
ficient. We have 


Py{Xy = x1, Xo = 2,-..,Xn = Xn} = pri* (1 — py" LI, 
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and taking 


p Det *i 
h(x, x2,...,%n) = 1 and gy (x1, x2,...,%n) = (1 — p)” (4) ; 
we see that T is sufficient. We note that 7;(X) = (X), X2 + X3 +++: + X,) and 
T2(X) = (X1 + X2, X3, X4 + X5 +---+ X,,) are also sufficient for p, although T 
is preferable to Tj or 7). 


Example 7. Let X,, X2,... , Xn be iid RVs with common PMF 


P{X; =k} = k= 1525 og NG £15) 2 onc Ns 


1 
N + 
Then 


i 
Py{X1 = kj, X2 =ko,... Xn = ka} = ay ifl <ki,...,kn <N, 


1 ' 
= rriaee Ae Rivet moan ke N), 
where g(a,b) = 1 if b > a, and = Oif b < a. It follows, by taking gy[max 


(ky, ..-kn)] = C/N”) p(maxi<ien kj, N) and h = g(1, mink;), that max(X,, X2, 
... , Xn) is sufficient for the family of joint PMFs Py. 


Example 8. Let X;, X2,... , Xn be a sample from N (yu, o”), where both yz and 
o? are unknown. The joint PDF of (X1, X2,... , Xn) is 


eh oi fe 
(oV/ 27)" 202 o2 202 J” 


is jointly sufficient for the parameter (2,07). An equivalent sufficient statistic that 
is frequently used is T;(X},... , Xn) = (X, S”). Note that X is not sufficient for ph 
if o2 is unknown, and S? is not sufficient for 02 if pt is unknown. If, however, o2 is 
known, X is sufficient for sz. If 4. = jo is known, YX - Lo)? is sufficient for 0”. 
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Example 9. Let X;, X2,... , Xn be a sample from PDF 
1 
folx)= {0° 
0, otherwise. 


The joint PDF of X;, X2,... , Xn is given by 


1 
fo(x1, x2, tee »Xn) = gn fa(1, see »Xn), 
where 
6 d 6 
A= {(x1,%2,.-.,%n)? =) < minx; < maxx; < 2 7 


It follows that (X(1), X(n)) is sufficient for 0. 

We note that the order statistic (X(1), X(2), ..., Xq)) is also sufficient. Note also 
that the parameter is one-dimensional, the statistics (X(1), Xm) is two-dimensional, 
and the order statistic is n-dimensional. 


In Example 9 we saw that the order statistic is sufficient. This is not a mere coin- 
cidence. In fact, if X = (X1;, X2,..., X,) are exchangeable, the joint PDF of X is a 
symmetric function of its arguments. Thus 


fo(x1, 2,.-.,X4n) = fo(Xa), XQ), «++ Xny), 
and it follows that the order statistic is sufficient for fg. 


The concept of sufficiency is used frequently with another concept, called com- 
pleteness, which we now define. 


Definition 2. Let {fo(x),@ € ©} be a family of PDFs (or PMFs). We say that 
this family is complete if 


Eog(X) =0 for all6 ¢ O 
implies that 
Po{g(X) =0} = 1 for all € O. 


Definition 3. A statistic T(X) is said to be complete if the family of distributions 
of T is complete. 


In Definition 3 X will usually be a multiple RV. The family of distributions of T 
is obtained from the family of distributions of X;, X2,... , Xn by the usual transfor- 
mation technique discussed in Section 4.4. 
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Example 10. Let X,, X2,...,Xn be iid b(1, p) RVs. Then T = I X; isa 


sufficient statistic. We show that T is also complete; that is, the family of distributions 
of T, {b(n, p),O < p < 1}, is complete. 


Epg(T) = Yee(")ora —p)"'=0 forall p< (0, 1) 
t=0 


may be rewritten as 


n ft 
i n\( p_\_ 
(i — p) 2, so(") (4 = -) =0 for all p € (0, 1). 


This is a polynomial in p/(1 — p). Hence the coefficients must vanish, and it follows 
that g(t) = O fort = 0, 1,2,... ,n, as required. 


Example 11. Let X be N(O, 9). Then the family of PDFs {/(0, 6), 6 > 0} is not 
complete since EX = 0 and g(x) = x is not identically zero. Note that T(X) = X 2 
is complete, for the PDF of X* ~ 6x7(1) is given by 


e7t/28 
—=—=,, t>0, 
SO = 4 J276t 
0, otherwise. 
1 oe 1/2,,-t/20 
Eeg(T) = =| (t)t” “ee !" dt =0 for all 9 > 0, 
2 276 Jo 


which holds if and only if BP g(t)t—!/2e-*/28 at = 0, and using the uniqueness 
property of Laplace transforms, it follows that 


g(t)t 1/2 =0 for all t > 0, 
that is, g(t) = 0. 


The next example illustrates the existence of a sufficient statistic that is not com- 
plete. 


Example 12. Let X 1, X2,...,Xn be a sample from N(6,6*). Then T = 
(°} Xi, Cf X?) is sufficient for 6. However, T is not complete since 


an 2 n 
Eo 2(3>xi) —(n+1)>°X?|=0 — forall, 
1 1 


and the function g(x1,... , xn) = 205 xi)? —(n+1) ST x} is not identically zero. 
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Example 13. Let X ~ U(O, 0), @ € (0, 00). We show that the family of PDFs of 
X is complete. We need to show that 


6 
1 
Eog(X) = if gar) ax =0 for all@ > 0 
0 


if and only if g(x) = 0 for all x. In general, this result follows from Lebesgue 
integration theory. If g is continuous, we differentiate both sides in 


6 
i g(x) dx =0 
0 


to get g(0) = 0 for all 0 > 0. 
Now let Xi, X2,..., Xn be iid U(O, 8) RVs. Then the PDF of X(,) is given by 


nowrynl O<x <4, 


0, otherwise. 


fats 0)=| 


We see by a similar argument that X(,) is complete, which is the same as saying that 
{fn(x |); 0 > 0} is a complete family of densities. Clearly, X(,) is sufficient. 


Example 14. Let X1, X,... , X, be a sample from PMF 


1 
Py(x) = {w 


0, otherwise. 


= 1, Qatar Ny 


We first show that the family of PMFs { Py, N >-1} is complete. We have 
1 & 
Eng(X)= — ky =0 for all N > 1, 
na(X) = = 28 ) > 


and this happens if and only if g(k) = 0,k = 1,2,...,N. Next we consider the 
family of PMFs of X(n) = max(X1,... , Xn). The PMF of Xm) is given by 


x” (x—-1)" 
Ne NR 


P(x) = eh Ny, 


Also, 


k® (k—1)" 


a ]-° for all N > 1. 


N 
Evg(X) = 98) [ 
k=} 


Ei g(Xy) = g(l) =0 
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implies that g(1) = 0. Again, 


E2g(X(n)) = se + g(2) (: = =) =0 
so that g(2) = 0. 

Using an induction argument, we conclude that g(1) = g(2) =--- = g(N) =0 
and hence g(x) = 0. It follows that py isa complete family of distributions, and 
Xn) is a complete sufficient statistic. 

Now suppose that we exclude the value N = no for some fixed ng > 1 from 
the family {Py: N > 1}. Let us write P = {Py: N > 1, N # no}. Then P is 
not complete. We ask the reader to show that the class of all functions g such that 
Epg(X) = 0 forall P € P consists of functions of the form 


0, k=1,2,...,n9—1,n9 +2,np +3,..., 
gtk) = 4c, k=no, 
ec, k=no+1, 


where ¢ is a constant, c 4 0. 


Remark 7. Completeness is a property of a family of distributions. In Remark 6 
we saw that if a statistic is sufficient for a class of distributions, it is sufficient for 
any subclass of those distributions. Completeness works in the opposite direction. 
Example 14 shows that the exclusion of even one member from the family {Py: N > 
1} destroys completeness. 


The following result covers a large class of probability distributions for which a 
complete sufficient statistic exists. 


Theorem 2. Let { fo: @ € ©} be a k-parameter exponential family given by 


k 
(2) fo(x) = exp p Q;(0)T;(x) + D(6) + s00| , 


j=l 


where 0 = (01, 62,... , 9) € ©, an interval in Rx, Ty, To,... , Tk, and S are defined 
on Ra, T = (T}, To,..., 1), and x = (x1, x2; tee Xn), k< n. Let Q = (Q1, Q2, 
. , Qx), and suppose that the range of Q contains an open set in Ry. Then 


T = (M(X), F(X), ... , TeX) 
is a complete sufficient statistic. 


Proof. For a complete proof in a general setting, we refer the reader to Lehmann 
(63, pp. 142-143]. Essentially, the unicity of the Laplace transform is used on the 
probability distribution induced by T. We will content ourselves here by proving the 
result for the kK = 1 case when fg is a PMF. 
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Let us write Q(@) = @ in (2), and let (a, 8B) C ©. We wish to show that 
Eog(T(X)) = > g(t) Po{T(X) =} 
t 


(3) = >> g(t) explor + DO) + S*@)]=0 — foralla 
t 


implies that g(t) = 0. 

Let us write xt = x if x > 0,= Oifx < 0,andx” = —xifx <0,=0ifx >0. 
Then g(t) = g*(t)—g~ (t), and both g* and g~ are nonnegative functions. In terms 
of g* and g~, (3) is the same as 


(4) gt (tet tO 2s g (tet SO 
d x 
for all 6. 
Let 0 € (a, B) be fixed, and write 


gt (teh tS) 
my gt (t)e%ttS* ©) 


go (1) eS +S*@) 
Da 27 (te%ot+S* (t)° 


Then both p* and p~ are PMFs, and it follows from (4) that 


(6) dept =e pw) 
t t 


(5) pta= and p (t)= 


for all 5 € (@ — , B — 90). By the uniqueness of MGFs (6) implies that 
p'tt)=p (t)  forallt 


and hence that g+(t) = g(t) for all ¢, which is equivalent to g(t) = 0 for all ¢. 
Since T is clearly sufficient (by the factorization criterion), it is proved that T is a 
complete sufficient statistic. 


Example 15. Let X1, X2,..., Xn be iid N(u,07) RVs where both ys and o2 
are unknown. We know that the family of distributions of X = (X,,...,Xn) isa 
two-parameter exponential family with T(X1,..., Xn) = (0) Xi, 7] X?). From 
Theorem 2 it follows that T is a complete sufficient statistic. Examples 10 and 11 
fall in the domain of Theorem 2. 


In Examples 6, 8, and 9 we have shown that a given family of probability distri- 
butions that admits a nontrivial sufficient statistic usually admits several sufficient 
statistics. Clearly, we would like to be able to choose the sufficient statistic that re- 
sults in the greatest reduction of data collection. We next study the notion of a min- 
imal sufficient statistic. For this purpose it is convenient to introduce the notion of a 
sufficient partition. The reader will recall that a partition of a space X is just a col- 
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lection of disjoint sets Ey such that }°, Ea = ¥. Any statistic T(X1, X2,... , Xn) 
induces a partition of the space of values of (X1, X2,..., Xn), that is, T induces a 
covering of X by a family & of disjoint sets A; = {(x1, x2,...,Xn) € X: Ty, x2, 

.,Xn) = t}, where ¢ belongs to the range of T. The sets A; are called partition 
sets. Conversely, given a partition, any assignment of a number to each set so that 
no two partition sets have the same number assigned defines a statistic. Clearly, this 
function is not, in general, unique. 


Definition 4. Let {Fg: 0 € ©} be a family of DFs, and X = (X1, X2,..., Xn) 
be a sample from Fg. Let £1 be a partition of the sample space induced by a statistic 
T =T(X\, X2,... , Xn). We say that L = {A, : t is in the range of T} is a sufficient 
partition for 0 (or the family {F@: 9 € ©}) if the conditional distribution of X, given 
T = t, does not depend on 6 for any A;, provided that the conditional probability is 
well defined. 


Example 16. Let X,, X2,... , Xn beiid b(1, p) RVs. The sample space of values 
of (X1, X2,... , Xn) is the set of n-tuples (x1, x2,... , Xn), where each x; = 0 or 
= 1 and consists of 2” points. Let T(X,, X2,...,Xn) = I X;, and consider the 
partition Li = {Ag, Aj,..., An}, wherex € A; if and only if }77 x; =j,0<j<n. 


n 
Each Aj; contains () sample points. The conditional probability 


P -1 
Pplx | Aj} = BaD a) ea: 


and we see that £ is a sufficient partition. 


Example 17. Let X\, X2,...,Xn be iid U[0,9] RVs. Consider the statistic 
T(X) = maxi<j<n X;. The space of values of X1, X2,..., Xp, is the set of points 


{x:O0< x; < 0,i = 1,2,...,n}. T induces a partition U on this set. The sets of 
this partition are A; = {(x1, x2,... , Xn) : max(x1,...,X%n) = 72}, t € [0, 6]. 
We have 
fo(x) : 
fox |jo= = if x € A;, 
fa) ‘ 


where fj (t) is the PDF of T. We have 


1/6” 
Peljeco a! 


nt?—ljgn nth! 


if x € A,. 


It follows that Ll = {A;} defines a sufficient partition. 


Remark 8. Clearly, a sufficient statistic T for a family of DFs {Fg: 6 € ©} 
induces a sufficient partition; and conversely, given a sufficient partition, we can 
define a sufficient statistic (not necessarily uniquely) for the family. 
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Remark 9. Two statistics 7,, T2 that define the same partition must be in one- 
to-one correspondence, that is, there exists a function h such that T; = h(7>) with 
a unique inverse, 72 = h7!(7}). It follows that if 7; is sufficient, every one-to-one 
function of 7; is also sufficient. 


Let L,, Uy be two partitions of a space X. We say that LU, is a subpartition of U2 
if every partition set in {2 is a union of sets of LL. We sometimes say also that LU; 
is finer than Uy (Lg is coarser than {;) or that Ly is a reduction of ,. In this case, 
a statistic T2 that defines {2 must be a function of any statistic 7; that defines {;. 
Clearly, this function need not have a unique inverse unless the two partitions have 
exactly the same partition sets. 

Given a family of distributions { Fg: 9 € ©} for which a sufficient partition exists, 
we seek to find a sufficient partition { that is as coarse as possible; that is, any 
reduction of LU leads to a partition that is not sufficient. 


Definition 5. A partition £1 is said to be minimal sufficient if 


(i) Wis a sufficient partition, and 
(ii) if C is any sufficient partition, C is a subpartition of LU. 


The question of the existence of the minimal partition was settled by Lehmann and 
Scheffé [62] and, in general, involves measure-theoretic considerations. However, 
in the cases that we consider where the sample space is either discrete or a finite- 
dimensional Euclidean space, and the family of distributions of X is defined by a 
family of PDFs (PMFs) { fg, 0 € ©}, such difficulties do not arise. The construction 
may be described as follows. 

Two points x and y in the sample space are said to be likelihood equivalent, and 
we write x ~ y, if and only if there exists a k(y, x) #4 O which does not depend 
on @ such that fe(y) = k(y, x) fe(x). We leave the reader to check that “~” is an 
equivalence relation (that is, it is reflexive, symmetric, and transitive) and hence “~” 
defines a partition of the sample space. This partition defines the minimal sufficient 
partition. 


Example 18. Consider Example 16 again. Then 


fo) _ Lx-Lyi gy — py LatL yi 
Fo) Pp (1 - p) 5 


and this ratio is independent of p if and only if 


n n 
bse: =, yo 
1 1 


so that x ~ y if and only if >") x; = )°{ yi. It follows that the partition LL = 
{Ao, Ai,...,An}, where x € A; if and only if >) x1 = Jj, introduced in Exam- 
ple 16, is minimal sufficient. 
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A rigorous proof of the assertion above is beyond the scope of this book. The 
basic ideas are outlined in the following theorem. 


Theorem 3. The relation “~” defined above induces a minimal sufficient parti- 
tion. 


Proof. Mf T is a sufficient statistic, we have to show that x ~ y whenever T(x) = 
T (y). This will imply that every set of the minimal sufficient partition is a union of 
sets of the form A; = {T = t}, proving condition (ii) of Definition 5. 

Sufficiency of T means that whenever x € A;, then 


fo{x|T =t}= 


is free of 0. It follows that if both x and y € Ar;, then 


fox|t) _ fo) 
feylt)  foty) 


is independent of @, and hence x ~ y. 

To prove the sufficiency of the minimal sufficient partition UU, let T; be an RV 
that induces £. Then 7) takes on distinct values over distinct sets of £{ but remains 
constant on the same set. If x € {7 = t)}, then 


fo) 
7 | ee ee aca dey 
(7) fo | T =) PolT = hi) 
Now 
Pott =n) = [ fody or 2 foly), 
(y:T(y)=t1) (y:Ti(y)=t1) 


depending on whether the joint distribution of X is absolutely continuous or discrete. 
Since fo(x)/foe(y) is independent of 6 whenever x ~ y, it follows that the ratio on 
the right-hand side of (7) does not depend on 6. Thus 7; is sufficient. 


Definition 6. A statistic that induces the minimal sufficient partition is called a 
minimal sufficient statistic. 


In view of Theorem 3, a minimal sufficient statistic is a function of every sufficient 
statistic. It follows that if 7; and 7> are both minimal sufficient, then both must 
induce the same minimal sufficient partition, and hence 7; and 72 must be equivalent 
in the sense that each must be a function of the other (with probability 1). 

How does one show that a statistic T is not sufficient for a family of distributions 
P? Other than using the definition of sufficiency, one can sometimes use a result 
of Lehmann and Scheffé [62] according to which if 7 (X) is sufficient for 0, 0 € 
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©, then 72(X) is also sufficient if and only if 7;)(X) = g(J2(X)) for some Borel- 
measurable function g and all x € B, where B is a Borel set with P9B = 1. 
Another way to prove T nonsufficient is to show that there exist x for which 
T(x) = T(y) but x and y are not likelihood equivalent. We refer to Sampson and 
Spencer [96] for this and similar results. 
The following important result is proved in the next section. 


Theorem 4. A complete sufficient statistic is minimal sufficient. 


We emphasize that the converse is not true. A minimal sufficient statistic may not 
be complete. 


Example 19. Suppose that X ~ U(@,6 + 1). Then X is a minimal sufficient 
statistic. However, X is not complete. Take, for example, g(x) = sin27x. Then 


6+ 1 
Eg(X)= [ sin 22x dx = [ sin 27x dx =0 
6 0 


for all 6, and it follows that X is not complete. 
If Xi, X2,..., Xn is a sample from U(@,@ + 1), then (Xa), Xq@m)) is minimal 
sufficient for 6 but not complete since 


n—1 
Be 
Eo(X(n) ay) ee 


for all 6. 


Finally, we consider statistics that have distributions free of the parameter(s) @ 
and seem to contain no information about @. We will see (Example 23) that such 
statistics can sometimes provide useful information about 0. 


Definition 7. A statistic A(x) is said to be ancillary if its distribution does not 
depend on the underlying model parameter 0. 


Example 20. Let X), X2,... , Xn be a random sample from N (wu, 1). Then the 
statistic A(X) = (n — 1)S? = )7"_,(X; — X)? is ancillary since (n — 1)S? ~ 
x?(n — 1), which is free of 2. Some other ancillary statistics are 


n 
X1—X, Xa) — Xa), and > |X — XI. 


i=1 


Also, X, a complete sufficient statistic (hence minimal sufficient) for 4 is indepen- 
dent of A(X). 
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Example 21. Let X;, X2,...,X, be a random sample from N(0, 02). Then 

A(X) = X follows N(0, n~!'o?) and is not ancillary with respect to the parame- 
2 
tera“. 


Example 22. Let X(1), X(2),--.,X ny be the order statistics of a random 
sample from the PDF f(x — 0), where @ € WR. Then the statistic A(X) = 
(Xa) — Xa, ---X@y — Xqy) is ancillary for 0. 


In Example 20 we saw that S? was independent of the minimal sufficient statistic 
X. The following result due to Basu shows that it is not a mere coincidence. 


Theorem 5. If S(X) is a complete sufficient statistic for @, then any ancillary 
statistic A(X) is independent of S. 


Proof. If A is ancillary, then P9{A(X) < a} is free of 0 for all a. Consider the 
conditional probability ga(s) = P {A(X) <a | S(X) = s}. Clearly, 


Eo {ga(S(X))} = Po{A(X) <a}. 
Thus 

Eo(8a(S) — P{A(X) < a}) =0 
for all 6. By completeness of S it follows that 

Po{ga(S) — P{A <a} =O} = 1; 
that is, 

Po {A(X) <a | S(X) = 5s} = P{A(X) < a} 

with probability 1. Hence A and S are independent. 


The converse of Basu’s theorem is not true. A statistic S that is independent of 
every ancillary statistic need not be complete (see, for example, Lehmann [60]). 

The following example due to R. A. Fisher shows that if there is no sufficient 
statistic for 6 but there exists a reasonable statistic not independent of an ancil- 
lary statistic A(X), the recovery of information is sometimes helped by the ancillary 
statistic via a conditional analysis. Unfortunately, the lack of uniqueness of ancillary 
statistics creates problems with this conditional analysis. 


Example 23. Let X), X2,...,Xn be a random sample from an exponential 
distribution with mean 6, and let Y;, Y2,... , Y, be another random sample from 
an exponential distribution and mean 1/0. Assume that X’s and Y’s are inde- 
pendent and consider the problem of estimation of 6 based on the observations 
(X1, X2,...,Xn3 Vi, Yo,..., Yn). Let Sy) = rot x; and So(y) = am Vie 
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Then (Sy (X), 52(Y)) is jointly sufficient for 6. It is easily seen that (5), S2) is a 
minimal sufficient statistic for @. 
Consider the statistics 


at i 
S2(Y) ; 


S(X, Y) = 
and 
A(X, Y) = S$, (X)S2(Y). 


Then the joint PDF of S and A is given by 


2 a Sy) 8 [AG y)}"7 
Toe? | Atay) ( 6 rary S(x,y) 


and it is clear that S and A are not independent. The marginal distribution of A is 
given by the PDF 


C(x, yA, yr", 


where C(x, y) is the constant of integration, which depends only on x, y, and n but 
not on @. In fact, C(x, y) = 4Ko[2A(x, yVIrmy, where Ko is the standard form 
of a Bessel function (Watson [115]). Consequently A is ancillary for 0. 

Clearly, the conditional PDF of S given A = a is of the form 


1 2 |-«(=E2 + 0 )| 
2Kol2alS(x.y) @ | Sm@yY/)|' 


The amount of information lost by using S(X, Y) alone is the [1/(2n + 1)]th part of 
the total, and this loss of information is gained by knowledge of the ancillary statistic 
A(X, Y). These calculations are discussed in Example 8.5.9. 


PROBLEMS 8.3 


1. Find a sufficient statistic in each of the following cases based on a random sam- 
ple of size n: 
(a) X ~ B(a, B) when (i) a is unknown, 8 known; (ii) 6 is unknown, a known; 
and (iii) a, 8 are both unknown. 
(b) X ~ G(a, 8) when (i) a is unknown, 8 known; (ii) 6 is unknown, @ known; 
and (iti) a, 8 are both unknown. 
(c) X ~ Pn,,n,(x), where 
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PN,,N, (x) = x =N,+1,N, 4+2,..., Ne, 


Nz — Ny’ 
and N;, N2(N, < N2) are integers, when (i) Ni is known, N2 unknown; 
(ii) Nz known, N; unknown; and (iii) N;, N2 are both unknown. 


(d) X ~ fe(x), where 


et? if <x <0, 
PeiKy =: : otherwise. 
(e) X ~ f(x; w, 0), where 
nit ) : exp Egee wy]. >0 
X, BP, o)= —— = : . 
. xoJ2n H 202 8 


(f) X ~ fo(x), where 
fo(x) = Po{X =x} =c(0)2"/, x =0,041,...,0>0, 
and 
(8) = 2)-1/0 (21/8 _ 1), 
(g) X ~ Po p(x), where 
Pop(x) =(— p)p™?, x = 0, 041,...,0< p<], 
when (i) p is known, 0 unknown; (ii) p is unknown, 6 known; and (iii) p, 6 
are both unknown. 


2. Let X = (X1, X2,..., Xn) be a sample from N (ao, oa), where a is a known 
real number. Show that the statistic T(X) = ()“7_, Xi, V7, X?) is sufficient 
for o but that the family of distributions of T (X) is not complete. 

3. Let X1, X2,... , Xn be a sample from NV (yu, o”). Then X = (X1, X2,..., Xn) 
is clearly sufficient for the family V(u, 02), u € R,o > 0. Is the family of 
distributions of X complete? 


4, Let X1,X2,... , Xn be a sample from U(@ — 4,6 + 4), 6 € R. Show that the 
Statistic T(X1,... , Xn) = (min X;, max X;) is sufficient for 9 but not complete. 


5. If T = g(U) and T is sufficient, so is U. 


6. In Example 14, show that the class of all functions g for which Epg(X) = 0 for 
all P € P consists of functions of the form 
0, k =1,2,...,no—1, no +2, no +3,..., 
g(k) = yc, k= no, 


—c, k=ngt+l, 


where c is a constant. 
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7. For the class {F9,, F¢,) of two DFs where Fo, is N(O, 1) and Fo, is CC, 0), find 
a sufficient statistic. 


8. Consider the class of hypergeometric probability distributions {Pp: D = 0, 1, 2, 
..., N}, where 


~1 
Pox =s)=(¥) eee! x =0,1,..., min{n, D}. 


Show that it is a complete class. If P = {Pp: D = 0,1,2,...,N, D # 
d, d integral 0 < d < N}, is P complete? 


9. Is the family of distributions of the order statistic in sampling from a Poisson 
distribution complete? 


10. Let (X;, X2,..., Xn) be a random vector of the discrete type. Is the statistic 
T(X1,..., Xn) = (X%j,... , Xn—1) sufficient? 


11. Let X;, X2,... , X, be arandom sample from a population with law £(X). Find 
a minimal sufficient statistic in each of the following cases: 
(a) X ~ P(A). 
(b) X ~ U[0, 6]. 
(c) X ~ NB(I; p). 
(d) X ~ Py, where Py{X =k} = 1/Nifk =1,2,...,N,and = 0 otherwise. 
(ce) X~N(u, 07). 
(f) X ~ Gq, B). 
(g) X ~ Ba, B). 
th) X ~ fo(x), where fo(x) = (2/07)(@ —x),0<x <8. 
12. Let X;, X2 be a sample of size 2 from P(A). Show that the statistic X; + a Xo, 


where a@ > 1 is an integer, is not sufficient for A. 


13. Let X1, X2,... , X, be a sample from the PDF 


X -x7/28 if 0 
falx) = 10° chine oo 
0 ifx <0 


Show that }77_, X? is a minimal sufficient statistic for 6, but }7_, Xi is not 
sufficient. 


14. Let X1, X2,... , X, be a sample from NV (0, 0”). Show that Viet x? is a mini- 
mal sufficient statistic but )-?_, X; is not sufficient for a. 


15. Let X1, X2,..., Xn be a sample from the PDF fy p(x) = BeP°™ if x > a, 
and = Oifx <a. Find a minimal sufficient statistic for (a, B). 
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16. Let T be a minimal sufficient statistic. Show that a necessary condition for a 
sufficient statistic U to be complete is that U be minimal. 


17. Let Xi, X2,... , Xn be iid N(2, o”). Show that (X, S*) is independent of each 
of (X(n) — Xay)/S, (Xin) — X)/S, and 27} (X41 — Xi)?/8?. 

18. Let X;, X2,... , Xn be iid N'(@, 1). Show that a necessary and sufficient condi- 
tion for )77_, aX; and )~7_, X; to be independent is }"7_, a; = 0. 

19. Let X;, X2,..., Xp, be a random sample from fo(x) = exp[—( — 6)], x > @. 
Show that X14) is a complete sufficient statistic which is independent of S?. 


20. Let Xj, X2,... , Xn be iid RVs with common PDF fo(x) = (1/0) exp(—x/@), 
x > 0, 6 > 0. Show that X must be independent of every scale-invariant statis- 
tic, such as X1/ 7-1 Xj. 


21. Let T;, To be two statistics with common domain D. Then 7; is a function of T> 
if and only if 


for all x, y € D, T(x) = Ti(Qy) = Nh) = Thy). 


22. Let S be the support of fo, @ € ©, and let T be a statistic such that for 
some 0;,02 € ©, and x,y € S,x # y, T(x) = T(y) but fo, (x) fo(y) # 
So, (x) fo, (y). Then show that T is not sufficient for 6. 

23. Let X1, Xo,..., Xp, be iid NO, 1). Use the result in Problem 22 to show that 
(3% X;)’ is not sufficient for 0. 


24, (a) If T is complete, show that any one-to-one mapping of T is also complete. 


(b) Show with the help of an example that a complete statistic is not unique for 
a family of distributions. 


8.4 UNBIASED ESTIMATION 


In this section we focus attention on the class of unbiased estimators. We develop 
a criterion to check if an unbiased estimator is optimal in this class. Using suffi- 
ciency and completeness, we describe a method of constructing uniformly minimum 
variance unbiased estimators. 


Definition 1. Let {Fo, @ € O}, © C Rx, be a nonempty set of probability 
distributions. Let X = (Xj, X2,..., Xn) be a multiple RV with DF Fg and sample 
space X. Let y : © — FR be a real-valued parametric function. A Borel-measurable 
function T : X — © is said to be unbiased for w if 


3) EoT (X) = w (0) for all 8 < ©. 


Any parametric function yw for which there exists a T satisfying (1) is called 
an estimable function. An estimator that is not unbiased is called biased, and the 
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function b(T, w), defined by 
(2) b(T, w) = EoT(X) — ¥(8), 
is called the bias of T. 


Remark 1. Definition 1, in particular, requires that Eg|T| < oo for all @ ec © 
and can be extended to the case when both yw and 7 are multidimensional. In most 
applications we consider © C Ry, (8) = @, and X1, X2,... , Xn are iid RVs. 


Example 1. Let X;, X2,... , Xn be arandom sample from some population with 
finite mean. Then X is unbiased for the population mean. If the population variance 
is finite, the sample variance S? is unbiased for the population variance. In general, 
if the kth population moment m, exists, the kth sample moment is unbiased for mx. 

Note that S is not, in general, unbiased for o. If X,, X2,... , Xp are iid N(w, 07) 
RVs we know that (n — 1)S?/o? is x2(n — 1). Therefore, 


E(Sva~t/o) = [vz : 


2e-D2T [nM — 1/2)” 


er OrCr)) 
como Vr O[e()) 


The bias of S is given by 


b(S,a) =o V5rG) [r ale if. 


We note that b(s, 0) + 0asn — oo, so that S is asymptotically unbiased for o. 


(n—1)/2-1 ,—-x/2 dx 


and 


If T is unbiased for 6, g(T) is not, in general, an unbiased estimator of 2(@) unless 
g is a linear function. 


Example 2. Unbiased estimators do not always exist. Consider an RV with PMF 
b(1, p). Suppose that we wish to estimate y(p) = p. Then, in order that 7 be 
unbiased for p*, we must have 


p* = EpT = pT(1) + (1 — p)T(O), O0<p<l; 
that is, 


p? = p{T(1) —T(0)} + TO) 
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must hold for all p in the interval [0, 1], which is impossible. (If a convergent power 
series vanishes in an open interval, each of the coefficients must be 0. See also Prob- 
lem 1.) 


Example 3. Sometimes an unbiased estimator may be absurd. Let X be P(A) and 
W (A) = e794. We show that T(X) = (—2)* is unbiased for (A). We have 


AS xe a (—2a)* -A-2 
E,T(X) =e 2) He yee Ao = WA). 


x=0 


However, T(x) = (—2)* > O if x is even and < O if x is odd, which is absurd since 


vQ)>0. 


Example 4. Let X,, X2,... , X, be a sample from P(A). Then X is unbiased for 
Xr _and so also is $2, since both the mean and the variance are equal to A. Indeed, 
aX + (1 — @)S?, 0 <a <1, is unbiased for A. 


Let 0 be estimable, and let T be an unbiased estimator of 6. Let 7, be another 
unbiased estimator of 9, different from 7. This means that there exists at least one 
6 such that Po{T #~ 1} > 0. In this case there exist infinitely many unbiased 
estimators of 6 of the form aT + (1 — @)7;, 0 < @ < 1. It is therefore desirable to 
find a procedure to differentiate among these estimators. 


Definition 2. Let 69 € © and l/(89) be the class of all unbiased estimators T of 
69 such that Eq T? < oo. Then Tg € U(Gp) is called a locally minimum variance 
unbiased estimator (LMVUE) at 6 if 
(3) Eo, (To ~ 90) < Eg(T — 9)" 
holds for all T € U(Go). 

Definition 3. Let U/ be the set of all unbiased estimators T of @ € © such that 
EoT* < © for all 9 € ©. An estimator Tg € U is called a uniformly minimum 
variance unbiased estimator (UMVUE) of 6 if 


(4) Eo(T> — 0)" < E9(T — 6)" 


for all 9 € © and every T €U. 


Remark 2. Let ay,a2,...,@, be any set of real numbers with }(7_,) a; = 1. 
Let X1, X2,..., Xn be independent RVs with common mean yz and variances o;’, 
k=1,2,...,n.Then T = a a; X; is an unbiased estimator of 2 with variance 


a ata? (see Theorem 4.5.6). T is called a linear unbiased estimator of y. Linear 
unbiased estimators of yz that have minimum variance (among all linear unbiased 
estimators) are called best linear unbiased estimators (BLUEs). In Theorem 4.5.6 
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(Corollary 2) we have shown that if X; are iid RVs with common variance a”, the 
BLUE of wz is X = n7! St, Xj. If X; are independent with common mean ps 
but different variance Oe; the BLUE of yz is obtained if we choose a; proportional 
to lt or: then the minimum variance is H/n, where H is the harmonic mean of 


a?,... ,02 (see Example 4.5.4). 


Remark 3. Sometimes the precision of an estimator T of parameter 6 is mea- 
sured by the mean square error (MSE). We say that an estimator 7p is at least as 
good as any other estimator 7 in the sense of the MSE if 


(5) Eg(To — 9)? < Eo(T —9)* _— forall EO. 


In general, a particular estimator will be better than another for some values of 6 and 
worse for others. Definitions 2 and 3 are special cases of this concept if we restrict 
attention to unbiased estimators. 


The following result gives a necessary and sufficient condition for an unbiased 
estimator to be a UMVUE. 


Theorem 1. Let 2/ be the class of all unbiased estimators T of a parameter 9 € © 
with EgT” < 00 for all @, and suppose that U/ is nonempty. Let U4 be the class of all 
unbiased estimators v of 0, that is, 


U = {v: Eau = 0, Epv" < oo for all @ € O}. 
Then 7p € U is a UMVUE if and only if 
(6) Eg(vTo) = 0 for all @ and all v € Up. 


Proof. The conditions of the theorem guarantee the existence of Eg (v7) for all 
@ and v € Up. Suppose that Ty € U isa UMVUE and Eg, (vo7To) ¥ 0 for some 6 and 
some vg € Up. Then 7p + Avg € U for all real A. If Equa = 0, then Eg, (voTo) = 0 
must hold since Pa, {vp = 0} = 1. Let Egyue > 0. Choose Ao = — Ea, (Tovo)/ Eau. 
Then 


E} (voTo) 
(7) Eey(To + A0v0)” = Ea TS — X—— < Ew}. 
Eevq 


Since Tp + Agvo € U and Tp € U, it follows from (7) that 
(8) vata, (To + Avo) < vata (To), 


which is a contradiction. It follows that (6) holds. 
Conversely, let (6) hold for some 7p € YU, all @ € © and all v € Up, and let T € U. 
Then To — T € Uo, and for every 0, 
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Eo{To(To — T)} = 0. 
We have 
EoT@ = Eg(TTp) < (EgT2)'/? (EgT?)'/? 


by the Cauchy—Schwarz inequality. If Eg TT = 0, then P(7) = 0) = 1 and there is 
nothing to prove. Otherwise, 


(EoT3)'!? < (EeT?)'? 
or vatg(To) < varg(T). Since T is arbitrary, the proof is complete. 


Theorem 2. Let U/ be the nonempty class of unbiased estimators as defined in 
Theorem 1. Then there exists at most one UMVUE for 0. 


Proof. If T and To € U are both UMVUEs, then T — Tp € Up and 
Eo{To(T — To)} = 0 for all 6 € 0, 
that is, EgT,2 = Eo(TTp), and it follows that 
cov(T, To) = vare (To) for all 6. 
Since Tp and T are both UMVUEs, varg(T) = vare(Zo), and it follows that the 
correlation coefficient between T and Tp is 1. This implies that P9{aT + bTp = 0} = 


1 for some a, b and all 9 € ©. Since T and Tp are both unbiased for 9, we must have 
Po{T = To} = 1 for all 6. 


Remark 4. Both Theorems 1 and 2 have analogs for LMVUE’s at 0) € ©, 0% 
fixed. 


Theorem 3. If UMVUEs 7; exist for real functions y;, i = 1,2, of @, they also 
exist for Aw; (A real), as well as for yr + we, and are given by AT; and 7; + 72, 
respectively. 


Theorem 4. Let {T7,,} be a sequence of UMVUEs and T be a statistic with 
EgT? < ow and such that E9{T, — T}2 + 0asn — oo for all @ € ©. Then T is 
also the UMVUE. 


Proof. That T is unbiased follows from |EgT — 6| < Ee|T — T,| < EY (T, — 
T)*. For all v € U%, all 6, and every n =1,2,..., 


E9(Tnv) =0 


by Theorem 1. Therefore, 
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Eg(vT) = Eg(vT) — Eg(vT,) 
= Eo[v(T — T,)] 
and 
|E9(vT)| < (Epv’)/?[E9(T — T)*I'2 +0  asn > 00 
for all 6 and all v € U/. Thus 
Eg(vT) = 0 forallue&, alldeO, 
and by Theorem 1, T must be the UMVUE. 


Example 5. Let X,, X2,..., Xn be iid P(A). Then X is the UMVUE of A. 
Surely, X is unbiased. Let g be an unbiased estimator of 0. Then 7 (X) = X + g(X) 
is unbiased for 9. But X is complete. It follows that 


E,g(X)=0 = forallA>O=> g(x) =0 forx =0,1,2,.... 
Hence X must be the UMVUE of A. 


Example 6. Sometimes an estimator with larger variance may be preferable. 

Let X be aG(1, 1/8) RV. X is usually taken as a good model to describe the time 
to failure of a piece of equipment. Let X1, X2,... , X, be asample of n observations 
on X. Then X is unbiased for EX = 1/8 with variance 1/(nB7). (X is actually 
the UMVUE for 1/8.) Now consider X(q¥) = min(X1, X2,... , Xn). Then nXqy is 
unbiased for 1/8 with variance 1/8, and it has a larger variance than X. However, 
if the length of time is of importance, n X (4) may be preferable to X, since to observe 
nX i) one needs to wait only until the first piece of equipment fails, whereas to 


compute X one would have to wait until all the m observations X;, X2,... , X» are 
available. 
Theorem 5. If a sample consists of n independent observations X;, X2,..., Xn 


from the same distribution, the UMVUE, if it exists, is a symmetric function of the 
X i s. 


The proof is left as an exercise. 

The converse of Theorem 5 is not true. If X1, X2,... , X, are iid P(A) RVs, 
2 > 0, both X and S? are unbiased for 6. But X is the UMVUE, whereas S$? is not. 

We now turn our attention to some methods for finding UMVUEs. 


Theorem 6 (Blackwell [9], Rao [85]). Let {Fg : @ € ©} bea family of probability 
DFs and h be any statistic in U, where U/ is the (nonempty) class of all unbiased 
estimators of 6 with Egh? < oo. Let T be a sufficient statistic for {Fo,0 € O}. 
Then the conditional expectation Eg{h | T)} is independent of @ and is an unbiased 
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estimator of 6. Moreover, 
(9) Eg(E{h | T} — 6)" < Egth — 6)" for all@ € O. 


The equality in (9) holds if and only ifh = E{h | T} (that is, Po{h = E{h | T}}= 1 
for all 0). 


Proof. We have 
Eg{E{h | T}} = Egh = 86. 
It is therefore sufficient to show that 
(10) Eo{E{h| T} < Eoh? forall € ©. 
But Egh* = Eg{E{h? | T}}, so that it will be sufficient to show that 
(11) [E(h | TH? < E{h | 7}. 
By the Cauchy—Schwarz inequality 
E*{h|T) < E{h? | T}E(1|T}, 
and (11) follows. The equality holds in (9) if and only if 
(12) Eo{E{h | T}P = Eoh’, 
that is, 
EolE{h? | T) — E’{h | T}] =0, 
which is the same as 
Eo{var{h | T}} = 0. 
This happens if and only if var{h | T} = 0, that is, if and only if 
Eth’ |T} = E*{h|T}, 


as will be the case if and only if h is a function of T. Thus h = E{h | T} with 
probability 1. 


Theorem 6 is applied along with completeness to yield the following result. 
Theorem 7 (Lehmann—Scheffé [62]). If T is a complete sufficient statistic and 


there exists an unbiased estimator h of 6, there exists a unique UMVUE of 6, which 
is given by E{h | T}. 
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Proof. ‘fhy, ho €U, then E{h, | T} and E{h2 | T} are both unbiased and 
Eg{E{hy | T} — E{h2 | T}] =0 forall@6 € ©. 


Since T is a complete sufficient statistic, it follows that E{h, | T} = E{h2 | T}. By 
Theorem 6 E{h | T} is the UMVUE. 


Remark 5. According to Theorem 6, we should restrict our search to Borel- 
measurable functions of a sufficient statistic (whenever it exists). According to The- 
orem 7, if a complete sufficient statistic T exists, all we need to do is to find a Borel- 
measurable function of T that is unbiased. If a complete sufficient statistic does not 
exist, an UMVUE may still exist (see Example 11). 


Example 7. Let X1, X2,...,Xn be N(@, 1). X1 is unbiased for 6. However, 


X=n! I X; is a complete sufficient statistic, so that | E{X, | X} is the UMVUE. 
We will show that E{X; | X} = X. Let Y = nX. Then ¥ is N(nO,n), X 
is N(@, 1), and (X1, Y) is a bivariate normal RV with variance covariance matrix 


I : . Therefore, 


1 
Xi,¥ 
E{X | y= Bx, + SEE - BY) 
igh Riya. 
n n 


as asserted. a 

If we let y(@) = 67, we can show similarly that X° — 1/n is the UMVUE for 
w (8). Note that x — 1/n may occasionally be negative, so that an UMVUE for 62 
is not very sensible in this case. 


Example 8. Let X1, X2,... , Xn be iid b(1, p) RVs. Then T = }“} X; is acom- 
plete sufficient statistic. The UMVUE for p is clearly X. To find the UMVUE for 
v(p) = p( — p), we have E(nT) = n*p, ET? = np + n(n — 1)p’, so that 
E(nT ~ T?} = n(n — 1)p(1 — p), and it follows that (nT — T?)/n(n — 1) is the 
UMVUE for v(p) = p(i — p). 


Example 9. Let X1, X2,...,Xn be a sample from N(u, 07). Then (X, S?) 
is a complete sufficient statistic for (2, a). X is the UMVUE for jz, and S? is the 
UMVUE for o?. Also, k(n)S is the UMVUE for o, where k(n) = /(m — D/2T((n— 
1)/2]/ T'(n/2). We wish to find the UMVUE for the pth quantile 3,. We have 


p= PX siph=P{z <="), 
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where Z is N(0, 1). Thus 3p = 0721p + u, and the UMVUE is 
T(X1, X2,.-. Xn) = Z1-pk(n)S + X. 
Example 10 (Stigler {109]). We return to Example 8.3.14. We have seen that the 
family { PY”: N > 1} of PMFs of Xq@) = maxi<j<n Xi is complete and X(q) is 


sufficient for N. Now EX, = (N + 1)/2, so that T(X1) = 2X, — 1 is unbiased for 
N. It follows from Theorem 7 that E{7 (Xj) | X(n)} is the UMVUE of N. We have 


a2, 24 n—1 
TL ifx; =1,2,...,y—1, 
P{X,=x1| Xm =y}= ek 
yagi | ee 
Thus 
—( — 1)"- 1 y-! 
E(T(X1) | Xi =y) = vee ¥> Qn -1) 
xy=l 
yr} 
2y — 1)——-—— 
+ (y — 1) Qa 
yrtl aes j)"tl 
y= (y— 1" 
is the UMVUE of N. 


If we consider the family P instead, we have seen (Example 8.3.14 and Prob- 
lem 8.3.6) that P is not complete. The UMVUE for the family {Py: N > 1} is 
T (X1) = 2X, — 1, which is not the UMVUE for P. The UMVUE for P is, in fact, 
given by 


Ti(k) = 2k — 1, k#no, kAnot+], 
2n9, k=no, k=notl. 

The reader is asked to check that 7; has covariance 0 with all unbiased estimators g 

of 0 that are of the form described in Example 8.3.14 and Problem 8.3.6, and hence 

Theorem | implies that 7; is the UMVUE. Actually, 7)(X ) is a complete sufficient 

statistic for P. Since E,,7T)(X1) = no + 1/no, Ti is not even unbiased for the family 

{Py : N > 1}. The minimum variance is given by 


vary (T (X})) if N < ng, 


T™1(X%1)) = 
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The following example shows that a UMVUE may exist, whereas a minimal suf- 
ficient statistic may not. 


Example 11. Let X be an RV with PMF 
Po(X =—1)=6 and Po(X =x) =(1-6)76", 
x =0,1,2,..., where 0 < 6 < 1. Let W(@) = Po(X =0) = (1 — 6). Then X is 


clearly sufficient, in fact minimal sufficient, for 9 but since 


[o.e) 
EX = (-1)0 +) x(1 — 0)°6* 
x=0 


24 a 
=-6+0(1-0)—Y 6% =0, 
+ 0( Ye dy 


it follows that X is not complete for {Pg : 0 < @ < 1}. We will use Theorem 1 to 
check if a UMVUE for #(@) exists. Suppose that 


fo. @] 
Egh(X) = h(—1)6 + (1 — 6)6* A(x) =0 
x=0 
for allO < 6 < 1. Then, forO0 <6 < 1, 
foe) Loe) CO 
0 = 6h(-1) +) O*h(x) — 2) 6 1h(x) + >" h(x) 
x=0 x=0 x=0 


= h(0) + amie: +1) —2h(x) +A(x — 1] 
x-0 


which is a power series in 0. 
It follows that h(0) = 0, and for x > 1, h(x + 1) — 2h(x) + h(x — 1) = O. Thus 


h(l) = h(-1), h(2) = 2h(1) — ACO) = 2h(-1), 
h(3) = 2h(2) — ACA) = 4h(—-1) — A(—1) = 3h(-1), 
and so on. Consequently, all unbiased estimators of zero are of the form h(X) = cX. 
Clearly, T(X) = 1 if X = 0, and = 0 otherwise is unbiased for y (6). Moreover, for 
all 0, 
E{cX - T(X)} = 9, 
so that T is UMVUE of (6). 


We conclude this section with a proof of Theorem 8.3.4. 
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Theorem 8. (Theorem 8.3.4) A complete sufficient statistic is a minimal suffi- 
cient statistic. 


Proof. Let S(X) be a complete sufficient statistic for {fg : 9 € ©} and let T be 
any statistic for which Eg|T2| < oo. Writing h(S) = E9{T|S}, we see that h is the 
UMVUE of EgT. Let 5\(X) be another sufficient statistic. We show that h(S) is a 
function of 5,. If not, then 4;(S;) = Eg{h(S)|S,} is unbiased for EgT and by the 
Rao—Blackwell theorem, 


varg hy (S|) < varg h(S), 


contradicting the fact that h(S) is UDMVUE for ET. It follows that h(S) is a function 
of S;. Since h and S; are arbitrary, S must be a function of every sufficient statistic 
and hence, minimal sufficient. 


PROBLEMS 8.4 


1. Let X;, X2,... , Xn(# => 2) be a sample from b(1, p). Find an unbiased estima- 
tor for Y(p) = p’. 


2. Let X1, X2,...,Xn(n >= 2) be a sample from N (p, 6”). Find an unbiased 
estimator for 0”, where p +n > 1. Find a minimum MSE estimator of 0”. 


3. Let X;, X2,... , Xn be tid N(y, o*) RVs. Find a minimum MSE estimator of 
the form aS? for the parameter o2. Compare the variances of the minimum MSE 
estimator and the obvious estimator S?. 


4. Let X ~ b(1, 62). Does there exist an unbiased estimator of 9? 
5. Let X ~ P(A). Does there exist an unbiased estimator of y(A) = A7!? 


6. Let X1, X2,... , X, be a sample from b(1, p),0 < p < 1,and0 <s <nbean 
integer. Find the UMVUE for (a) ¥(p) = p*, and (b) ¥(p) = p’ + (1 — py". 


7. Let Xi, X2,... , X, be a sample from a population with mean 6 and finite vari- 
ance, and T be an estimator of 6 of the form T(X, X2,... , Xn) = Diet aj Xj. 
If T is an unbiased estimator of @ that has minimum variance and T’ is another 
linear unbiased estimator of 6, then 


cove(T, T’) = vata(T). 


8. Let 7), 72 be two unbiased estimators having common variance ao2(a > 1), 
where o2 is the variance of the UMVUE. Show that the correlation coefficient 
between 7; and T2 is > (2 — a)/a. 


9. Let X ~ NB(i; 0) and d@) = Po{X = O}. Let Xj, X2,... , X, be a sample 
on X. Find the UMVUE of d(@). 
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10. This example covers most discrete distributions. Let X;, X2,... , Xn be a sam- 
ple from the PMF 
a(x)é* 
Po{X =x} = : x =0,1,2,..., 
f@) 


where 6 > 0, a(x) > 0, f(@) = oo 9 H(x)O*, (0) = 1, and let T = X; + 
X2+---+Xzy,. Write 


c(t,n) = > [ [ac 


Xp AQ, e00 Xn i=] 
n 
with Sa =t. 
iz] 


Show that T is a complete sufficient statistic for @ and that the UMVUE for 
d(@) = 60’ (r > Ois an integer) is given by 


0 ift<r. 
Y,-(@t) = 4 c(t —r,n) 


ift>r. 
c(t, n) 


(Roy and Mitra [92]) 
11. Let X be a hypergeometric RV with PMF 


ruirmai= (2) (2)02)) 


where max(0, M+n—N) <x < min(M,n). 
(a) Find the UMVUE for M when N is assumed to be known. 
(b) Does there exist an unbiased estimator of N (M known)? 


12. Let X1, X2,..., Xn beiid G(1, 1/A) RVs A > 0. Find the UMVUE of Py, {X1 < 
to}, where fg > 0 is a fixed real number. 


13. Let X1, X2,... , Xn be a random sample from P(A). Let (a) = "Po cea* 
be a parametric function. Find the UMVUE for y(A). In particular, find the 
UMVUE for (a) W(A) = 1/(1 — A), (b) WA) = A% for some fixed integer s > 0, 
(c) W(A) = Py{X = 0}, and (d) W(A) = Py {X = or 1}. 


14. Let X1, X2,... , X, be a sample from the PMF 


1 
Py (x) = 5: x=1,2,...,N. 


Let y%(N) be a function of N. Find the UMVUE of w(N). 
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15. Let X1, X2,..., Xn be a random sample from P(A). Find the UMVUE of 
w(A) = P,{X =k}, where k is a fixed positive integer. 


16. Let (X1, Y1), (Xo, ¥2),... , (Xn, Yn) be a sample from a bivariate normal pop- 
ulation with parameters 441, (42, 07, a, and p. Assume that uw; = 2 = BL, 
and it is required to find an unbiased estimator of 2. Since a complete sufficient 
statistic does not exist, consider the class of all linear unbiased estimators 


f(a) =aX+(1—a)¥. 


(a) Find the variance of /2. 
(b) Choose a = ag to minimize var({z), and consider the estimator 


fio = ao X + (1 — a). 


Compute var(jio). If 0; = o2, the BLUE of (in the sense of minimum 
variance) is 


“ 


oe 
fy = —— 


2 


irrespective of whether o; and p are known or unknown. 


(c) If o, 4 o2 and p, o1, 02 are unknown, replace these values in ap by their 
corresponding estimators. Let 

es Si 

S? + $3 — 2811 


Show that 
fi2 =Y¥ + (X-YV)a 
is an unbiased estimator of 2. 


17. Let X1, X2,..., Xn be iid N(O, 1). Let p = ®(x — 6), where ® is the DF of a 
N (0, 1) RV. Show that the UMVUE of p is given by ® (@ — X)J/n/(n — 1). 


18. Prove Theorem 5. 


19. In Example 10 show that 7; is the UMVUE for N (restricted to the family P), 
and compute the minimum variance. 


20. Let (X1, Y1),..., (Xn, Yn) be a sample from a bivariate population with finite 


variances oa? and oF, respectively, and covariance y. Show that 


1 n—2 o2a2 
var(Si1) = — (1 -~——y’? +14 ), 


n—-1 n—1 
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21. 


22. 


23. 


24. 


25. 


26. 
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where j222 = E[(X — EX)?(¥ — EY)?*]. It is assumed that appropriate order 
moments exist. 


Suppose that a random sample is taken on (X, Y) and it is desired to estimate y, 
the unknown covariance between X and Y. Suppose that for some reason a set S 
of n observations is available on both X and Y, an additional n; —n observations 
are available on X but the corresponding Y values are missing, and an additional 
nz — n observations of Y are available for which the X values are missing. Let 
S be the set of all nj(> n) X values, and Sp, the set of all n2(> n) Y values, 
and write 


X= dies; lt v= jes, oa X¥- dies Xi Fa Lies Mi 
ny n2 n n 
Show that 
in njn2 0 . 
= X; —X)(¥; -—Y 
? or Srna omee-veras @ 20 i- kW; -¥) 


ieS 


is an unbiased estimator of y. Find the variance of 7, and show that var(y) < 
var(S11), where S,; is the usual unbiased estimator of y based on the n observa- 
tions in S. (Boas [10]) 


Let X1, X2,..., Xn be iid with common PDF fo(x) = exp(—x + 6), x > 0. 
Let xo be a fixed real number. Find the UMVUE of f@ (xo). 


Let X,, X2,...,Xn be iid N(y, 1) RVs. Let T(X) = S77, X;. Show that 
g(x; t/n,n — 1/n) is the UMVUE of ¢(; u, 1) where 9; pL, o”) is the PDF 
of aN (1, 0”) RV. 


Let X1, X2,..., Xn be tid G(1, 0) RVs. Show that the UMVUE of f(x; 06) = 
(1/0) exp(—x/0), x > 0, is given by A(x|t) the conditional PDF of X, given 
T(X) = 77, Xi = t, where 


h(x|t)=(n—1)(t—x)™ 2/177! forx <tand =Oforx >t. 
Let Xj, X2,... , Xn be iid RVs with common PDF fg(x) = 1/(26), |x| < 4, 


and = 0 elsewhere. Show that 7(X) = max{~X(1), Xq@} is a complete suffi- 
cient statistic for 8. Find the UMVUE of 0”. 


Let X1, X2,... , X, be a random sample from the PDF 
1 = 
fats) = exp |-S—" |, x>p, o>0 
0 for 


where 0 = (ut, 0). 
(a) Show that (x (> Dja1 (Xj -X w)) is a complete sufficient statistic for 0. 
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(b) Show that the UMVUEs of y and o are given by 


n—1 


. iy who 
f= Xay- ——~ ) (Xj - Xm), 6 = > (Xj - Xe): 
n(n — 1) ZI ra 


_ 


(c) Find the UMVUE of w(u, 0) = Epo X1- 
(d) Show that the UMVUE of Pg {X1 > t} is given by 


n—2 
~ _ ne 1 oe t— X 1) = 
ane {[: Dix; al | 


where xt = max(x, 0). 


8.5 UNBIASED ESTIMATION (CONTINUED): LOWER BOUND FOR 
THE VARIANCE OF AN ESTIMATOR 


In this section we consider two inequalities, each of which provides a lower bound 
for the variance of an estimator. These inequalities can sometimes be used. to show 
that an unbiased estimator is the UMVUE. We first consider an inequality due to 
Fréchet, Cramér, and Rao (the FCR inequality). 


Theorem 1 (Cramér [17], Fréchet [31], Rao [84]). Let © C FR be an open in- 
terval and suppose that the family {fo : @ € ©} satisfies the following regularity 
conditions: 


Gi) It has common support set S. Thus S = {x: fe(x) > 0} does not depend 
on @. 
a 
(ii) For x € S and @ € O, the derivative 30 log fo(x) exists and is finite. 


(iii) For any statistic h with Eg|h(X)| < 00 for all 6, the operations of integration 
(summation) and differentiation with respect to 9 can be interchanged in 
Egh(X). That is, 


() “ h(x) fo(x) dx = / h(a) = fol) dx 


whenever the right-hand side of (1) is finite. 
Let TX) be such that vate TCX) < oo for all 6 and set w(@) = EoT(X). if 
a 
I(@) = Ee E log fa 00 | satisfies 0 < 1(@) < oo, then 


(y’ (a) 


(2) vatg T(X) > 76) 
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Proof. Since (iii) holds for h = 1, we get 


a 
(3) 0= R 59 0%) dx 


a 
= f E log fats| fo(x) dx 
=E ae log fo(X) 
= 90 Ss Jo : 
Differentiating (0) = EgT(X) and using (1), we get 
a 
(4) y'(0) = i T(x) 55 fo(x) dx 
S 
a 
= / [7% log fot%| fo(X) dx 
Ss 
= T(X g 1 x 
= cov | T( ) 35 og fo(X) |. 
Also, in view of (3), we have 
oi (XD /=E£ : l (X) ; 
vate | a5 og fe = Fo\ a5 og fo : 
and using Cauchy—Schwarz inequality in (4), we get 
3 2 
[w’@)I? < varg T(X) Eo E log fo0%| 


which proves (2). Practically the same proof may be given when fo is a PMF by 
replacing f by =. 


Remark 1. If, in particular, y(@) = @, then (2) reduces to 
1 
(5) varg(T (X)) = TO) 


Remark 2. Let X1, X2,..., Xn be iid RVs with common PDF (PMF) fo(x). 
Then 


i 


2 n y42 
so ra | 208 S000) = Ye, | eB HK) 


a0 d 00 
i=l 


Ki 


al xX 
nEo| vet 1) 


] 
| =ni\(8), 
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where [,(@) = Eo[d log fo(X1)/ 36]. In this case the inequality (2) reduces to 


Iw’@)? 


vatg(T (X)) = nh)” 


Definition 1. The quantity 


a log anu) 


(6) h@)= Fo| ry 


is called Fisher information in X, and 


0 log fo (X) 


2 
90 = nl}(8) 


(7) 1,0) = Eo 
is known as Fisher information in the random sample X;, X2,... , Xn. 


Remark 3. Asn gets larger, the lower bound for varg(T (X)) gets smaller. Thus, 
as the Fisher information increases, the lower bound decreases and the “best” esti- 
mator [one for which equality holds in (2)] will have smaller variance, consequently 
more information about 0. 


Remark 4. Regularity condition (i) is unnecessarily restrictive. An examination 
of the proof shows that it is only necessary that (ii) and (iii) hold for (2) to hold. 
Condition (i) excludes distributions such as fo(x) = 1/0,0 < x < 9, for which 
(3) fails to hold. It also excludes densities such as fo(x) = 1,0 < x < 6+], or 
fo(x) = (2/n) sin? (x +),0 <x <6@+1, each of which satisfies (iii) for h = 1, 
so that (3) holds but not (1) for all A with Eg{h| < oo. 


Remark 5. Sufficient conditions for regularity condition (iii) may be found in 
most calculus textbooks. For example, if (i) and (ii) hold, then (iii) holds provided 
that for all A with Eg|h| < oo for all 6 € ©, both Eg {h(X)[0 log fo(X)/ae]} and 
Eg \h(X)[8fo CX) /90]| are continuous functions of 6. Regularity conditions (i) to (iii) 
are satisfied for a one-parameter exponential family. 


Remark 6. The inequality (2) holds trivially if 1(@) = oo {and #’(@) is finite] 
or if varg(T (X)) = oo. 


Example 1. Let X ~ b(n, p); © = (0, 1) C R. Here the Fisher information may 
be obtained as follows: 


log fp(x) = log (") + x log p + (n — x) log(1 — p), 


0 log fp(x) _ max 


x 
ap po lp 
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and 


a log S00) n 
Epa Sa ee Se TY 
: ap pd=p) 


Let ¥(p) be a function of p and T(X) be an unbiased estimator of y%(p). The only 
condition that need be checked is differentiability under the summation sign. We 
have 


n n _ 

W(p) = EpT(X) = > ( ) T(x)p*(1 — py", 
x=0 - 

which is a polynomial in p and hence can be differentiated with respect to p. For any 


unbiased estimator T (X) of p, we have 


1 1 
var p(T (X)) = as —p)= Toy’ 


and since 


(=)-2 p= Pp) 
ee ag ee 
n n 


it follows that the variance of the estimator X/n attains the lower bound of the FCR 
inequality, and hence 7(X) has least variance among all unbiased estimators of p. 
Thus 7 (X) is the UMVUE for p. 


Example 2. Let X ~ P(A). We leave the reader to check that the regularity con- 
ditions are satisfied and 


var, (T(X)) > A. 


Since T(X) = X has variance A, X is the UMVUE of A. Similarly, if we take a 
sample of size n from P(A), we can show that 


IQ) = and = var, (T(Xj,... , Xn)) = 


sl> 


n 
a 


and X is the UMVUE. 
Let us next consider the problem of unbiased estimation of (A) = e—, based 
on a sample of size 1. The estimator 


1 if X = 0, 


d(X) = 
ce ( if X > 1, 
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is unbiased for y(A) since 
E,8 (X) = Eyl (X)P = Py{X = 0} =e" 
Also, 
var, (8 (X)) =e *(1 —e™). 
To compute the FCR lower bound, we have 
log fx (x) = x loga — A — logx!. 


This has to be differentiated with respect to e~*, since we want a lower bound for an 
estimator of the parameter e~*. Let 9 = e~*. Then 


1 
log fo(x) =x log log 7 + log@ — log x!, 


rs] 1 1 
99 18 Jom) = *Siop0 + a. 
and 
a a 4 1 it is? 
— | x» == j log — 
[Snark [rsd and cleghd (eed) 
el1-2 eke 
= 7 = I (e), 
so that 
Xr 1 
varg T(X) = ta Te-¥)’ 
where 0 = e~*. 


Since e~*(1 — e~*) > Ae~** for A > O, we see that var(5(X)) is greater than the 
lower bound obtained from the FCR inequality. We show next that 5(X) is the only 
unbiased estimator of 6, and hence is the UMVUE. 

If h is any unbiased estimator of 0, it must satisfy Egh(X) = @. That is, for all 
aA > 0, 


foe) Ns 
A Yoawe* =. 
rer ki 
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Equating coefficients of powers of 4 we see immediately that h(0) = 1 and h(k) = 0 
fork = 1,2,.... It follows that h(X) = 0(X). 

The same computation can be carried out when X1, X2,..., Xn is a random 
sample from P(A). We leave the reader to show that the FCR lower bound for any 
unbiased estimator of @ = e~* is Xe~74/n. The estimator )-7_, 9(X;)/n is clearly 
unbiased for e~* with variance e~*(1 — e~*)/n > (Ae~4)/n. The UMVUE of e~* 
is given by J = [(n — 1)/n]2i=1 *' with vary (Tp) = e72(e*/" — 1) > (Ae~)/n 
for all A > 0. 


Corollary. Let X1, X2,..., Xn be iid with common PDF fg(x). Suppose that 


the family {fg : 6 € O) satisfies the conditions of Theorem 1. Then equality holds 
in (2) if and only if for all 6 € ©, 


te) 
(8) T(x)-¥@)= KO) log fe(x) 
for some function k(@). 


Proof. Recall that we derived (2) by an application of the Cauchy—Schwatz in- 
equality where equality holds if and only if (8) holds. 


Remark 7. Integrating (8) with respect to 6, we get 
log fo(x) = Q(@)T (x) + S(6) + A(x) 


for some functions Q, S, and A. It follows that fe is a one-parameter exponential 
family and the statistic T is sufficient for 0. 


Remark 8. A result that simplifies computations is the following. If fg is twice 
a 
differentiable and Eg {5 log fa 00 | can be differentiated under the expectation 


sign, then 


a - a2 
(9) (0) = Eo Fa fo%| = —Eg E log fo(X) |. 


For the proof of (9), it is straightforward to check that 


foo [a : 
ae E le fos) : 


a2 
902 log fo(x) = 
Taking expectations on both sides we get (9). 


Example 3. Let X1, X2,... , Xn be iid N (1, 1). Then 
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1 _ 2 
log fu(x) = ey log(2x) — aoe 


a 
an log fu(x) =x — pH, 


and 
a2 
ae log fx(x) = —1. 
Hence / (jz) = 1 and J, (u) =n. 
We next consider an inequality due to Chapman, Robbins, and Kiefer (the CRK 


inequality) that gives a lower bound for the variance of an estimator but does not 
require regularity conditions of the Fréchet-Cramér—Rao type. 


Theorem 2 (Chapman and Robbins [11], Kiefer [50]). Let © C R and { fo(x) : 
@ € @} be aclass of PDFs (PMFs). Let y be defined on ©, and let T be an unbiased 
estimator of (6) with EgT? < oo forallé ¢ ©. 1f6 # gy, assume that fo and fy, 
are different and assume further that there exists a g € © such that 6 4 g and 


(10) S(O) = {fo(x) > 0} D S(~) = {fo(x) > 0}. 


Then 


[¥(y) - ver 
is T(X)) > varol fo (X)/fo(X)] 
(11) VETS reac ues 946} Vatol fy(X)/fo(X)] 


for all @ € Q. 


Proof. Since T is unbiased for wy, EyT (X) = (gy) for all g € ©. Hence, for 
9 #9, 


(12) f T(x) 2) — FO®) 6, oy) ax = yly) — ¥66), 
50) fo(x) 
which yields 
RO Aion 
cove [re Fo(X) 1 = v(¢) — v0). 


Using the Cauchy~Schwarz inequality, we get 


2 fo(X) _ fo(X) 
COVg [7e0, fa(X) 1 < varg(T (X)) vate bes - i 
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= vatg(T (X)) vate | ee | A 


fo(X) 
Thus 


[v@) — ver 
vata { fo(X)/fo(X)} 


and the result follows. In the discrete case it is necessary only to replace the integral 
in the left side of (12) by a sum. The rest of the proof needs no change. 


vatg(T(X)) = 


Remark 9. Inequality (11) holds without any regularity conditions on fg or 
w(@). We will show that it covers some nonregular cases of the FCR inequality. 
Sometimes (11) is available in an alternative form. Let @ and 6 + 5(6 # 0) be any 
two distinct values in © such that S(@ + 5) C S(@), and take W(@) = @. Write 


1 x)7* 


Then (11) can be written as 


1 
(13) vare(T (X)) = inf Eg" 


where the infimum is taken over all 5 # 0 such that S(@ + 5) C S(@). 


Remark 10. Inequality (11) applies if the parameter space is discrete, but the 
Fréchet-Cramér-Rao regularity conditions do not hold in that case. 


Example 4, Let X be U[0, 9]. The regularity conditions of FCR inequality do not 
hold in this case. Let (0) = 0. If p < 0, then S(g) C S(@). Also, 


a ik (2) 1 0 
E = —) =dx=-. 
|e ro \o) 6 


Thus 


(y —9)? 9? 
T(X a as ae ee 
varg(T (X)) = bia (@/) —1 ae p — 9) 4 


for any unbiased estimator T(X) of 6. X is a complete sufficient statistic, and 2X is 
unbiased for 6 so that T(X) = 2X is the UMVUE. Also, 


2 @2 


varg(2X) = 4 var X = cs > a" 
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Thus the lower bound of 62/4 of the CRK inequality is not achieved by any unbiased 
estimator of 6. 


Example 5. Let X have PMF 


1 


Py{X =k} ={N 
0, otherwise. 


; k=1,2,...,N, 


Let @ ={N: N > M, M > 1 given}. Take y(N) = N. Although the FCR regular- 
ity conditions do not hold, (11) is applicable since for N # N'€ OC R, 


S(N) = {1,2,...,N} D S(N’) = {1,2,...,N} if N’ <N. 


Also, Py and Py: are different for N 4 N’. Thus 


N—N’)? 
vary (T) > sup pera 
wren Vatn{Py/ Pn} 


Now 
PW gy = PWG) fae =D NN EN, 
Py Py (x) 0, otherwise, 
rey | PM) als N\?_N 
‘Py | ~ N S\N) NP 
and 
Py(X)] oN 
eee | t 
ny ee) N’ >0 forN > N 


It follows that 


(N — N’)? ; ; 
vary(T(X)) > sup —-———--— = sup N’(N —N’). 
" Sgt ENDINT 2 apg ( ) 


Now 


k(N —k) ‘ . N+1 
&-D(N-kd | if and only if k < 7 


so that N’(N — N’) increases as long as N’ < (N + 1)/2 and decreases if N’ > 
(N + 1)/2. The maximum is achieved at N’ = [(N + 1)/2} if M < (N + 1)/2 and 
at N’ = M if M > (N + 1)/2, where [x] is the largest integer < x. Therefore, 
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N+1 N+1 
anton s— (hee yee. 
2 2 2 
and 
vary (T(X)) > M(N — M), if M > S i 


Example 6. Let X ~ N(0, 07). Let us compute J (see Remark 9) for 5 #0. 
2 2n 2 2 
1 x: é 
pe k | eee] - 1 rah =. Fg De | 
62 fo® 82 | (a +5)™ (o + 6)? a2 


lll FV op | XL +208)] 
~ 62 (525) Pate rae |! 


and 


where c = (6? + 20)/(o + 8)?. 
Since )> X?/0? ~ x?(n), 


E,J : ( ea % ! 1 for ! 
=z —_— TRE SECT tas c< >. 
C 621 \o4+6 (1 — 2c)"/2 2 


Let k = 8/o; then 


and 
1 a 7 
EgJ = gall peo" =m SE S11; 


Here 1 +k > 0 and 1 — 2c > 0, so that 1 — 2k — k* > 0, implying that —/2 < 
k +1 < J/2 and also thatk > —1. Thus —1 <k < /2—1landk # 0. Also, 


LOWER BOUND FOR THE VARIANCE 401 


(14+4)78 (1 — 2k — k?) 8/2 - 1 


im E, J = li 
fax hoo k2o2 


by L’Hospital’s rule. We leave the reader to check that this is the FCR lower bound 
for var, (T (X)). But the minimum value of E, J is not achieved in the neighborhood 
of k = 0, so that the CRK inequality is sharper than the FCR inequality. Next, we 
show that for n = 2 we can do better with the CRK inequality. We have 


1 1 
Es | 
on 2G? F — 2k — k?)(1 +k)? 


(k + 2)2 
= aye ee a1, k#0. 
o2(1 +2 — 2k —k) l<k<v2—1, k# 


For k = —-0.1607 we achieve the lower bound as (E,J)~! = 0.269807, so that 
varg (T(X)) > 0.26980? > «7/4. Finally, we show that this bound is by no means 
the best available; it is possible to improve on the Chapman—Robbins~Kiefer bounds, 
too, in some cases. Take 


Pn/2) o [Xt X? 
P+ )/23 V2) 


to be an estimate of 0. Now E,T =o and 


or? = Zl P(n/2) ye(2e) 
2 LT I@+1)/2] a? 


T(X1, X2, eee Xn) = 


md 


2 LV{m+t 1/2] 
so that 
=o? {2 (_Par)_\?_ 
aes ara Hess) i} 
Forn = 2, 


4 
var, (T) = 07 (= = i) = 0.273267, 
oa 


which is > 0.269802, the CRK bound. Note that T is the UMVUE. 


Remark I]. In general the CRK inequality is as sharp as the FCR inequality. See 
Chapman and Robbins [11, pp. 584-585], for details. 


402 PARAMETRIC POINT ESTIMATION 
We next introduce the concept of efficiency. 


Definition 2. Let T;, T2 be two unbiased estimators for a parameter 9. Suppose 
that Eg T? < 00, Eg T? < 00. We define the efficiency of T; relative to T> by 


vatg (72) 
14 T, ee ee 
(14) effa(T1 | 72) varo(T) 
and say that 7; is more efficient than 7) if 
(15) effg(T; | T2) > 1. 


It is usual to consider the performance of an unbiased estimator by comparing its 
variance with the lower bound given by the FCR inequality. 


Definition 3. Assume that the regularity conditions of the FCR inequality are sat- 
isfied by the family of DFs { Fg, 9 € ©}, @ C R. We say that an unbiased estimator 
T for parameter 6 is most efficient for the family {Fo} if 


—1 
a log fo(X) 1? 
(16) wet) = {e[2 28) = In(6). 


Definition 4. Let T be the most efficient estimator for the regular family of DFs 
{F9,9 € Q}. Then the efficiency of any unbiased estimator T, of 0 is defined as 


varg(T) In) 
vare(T;)  vare(T1)" 


(17) effg(T1) = effo(T | T) = 


Clearly, the efficiency of the most efficient estimator is 1, and the efficiency of 
any unbiased estimator 7; is < 1. 


Definition 5. We say that an estimator 7; is asymptotically (most) efficient if 
(18) lim effg(T,) = 1 
n-—>0O 


and 7; is at least asymptotically unbiased in the sense that limy_.o. Eg7 = 0. Here 
n is the sample size. 


Remark 12. Definition 3, although in common use, has many drawbacks. We 
have already seen cases in which the regularity conditions are not satisfied and yet 
UMVUEs exist. The definition does not cover such cases. Moreover, in many cases 
where the regularity conditions are satisfied and UMVUEs exist, the UMVUE is not 
most efficient since the variance of the best estimator (the UMVUE) does not achieve 
the lower bound of the FCR inequality. 
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Example 7. Let X ~ b(n, p). Then we have seen in Example | that X/n is 
the UMVUE since its variance achieves the lower bound of the FCR inequality. It 
follows that X/n is most efficient. 


Example 8. Let X1, X2,..., Xn be iid P(A) RVs and suppose that y(A) = 
P,(X_= 0) =-e~*. From Example 2, the UMVUE of w is given by To = [(n — 
1)/n]Xi=) Xi with 

vaty (Tp) = e7**(e*/” ~ 1). 


Also, I,(@) = (Ae~2*)/n. It follows that 


(Ae?) /n he~*) /n 


eff, (To) = e2h(eh/n — 1) ~ (Alm) 


since e* —1 > x forx > 0. Thus 7p is not most efficient. However, since eff, (79) — 
1 as n — ov, Tp is asymptotically efficient. 


In view of Remarks 6 and 7, the following result describes the relationship be- 
tween most efficient unbiased estimators and UMVUEs. 


Theorem 3. A necessary and sufficient condition for an unbiased estimator T of 
yw to be most efficient is that T be sufficient and the relation (8) holds for some 
function k(0). 

Clearly, an estimator T satisfying the conditions of Theorem 3 will be the 
UMVUE, and two estimators coincide. We emphasize that we have assumed the 
regularity conditions of FCR inequality in making this statement. 

Example 9. Let (X, Y) be jointly distributed with PDF 

x 
falx, y) = exp|— (= +6y)], x>0, y>0O. 


For a sample (x, y) of size 1, we have 


) a x 
—- 1 =—|[-+0y)=-—> : 
56 log fo(x, y) =a + y) +y 


Hence, information for this sample is 


E(X?) 2E(XY) 
64 gz 


X\2 
1(6) = Eo(¥ a =) = Eo(¥?) + 


Now 


2 
Eo(¥?) = =, Eo(X?) = 207, and E(XY)=1, 
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so that 


2 2 2 2 
MON ge ga. @2 Be 


Therefore, the Fisher information in a sample of n pairs is 2n/0?. 
We return to Example 8.3.23, where X1, X2,... , X, are iid G(1, 9) and Y1, Y2, 
, Y, are iid G(1, 1/0), and X’s and Y’s are independent. Then (Xj, ¥;) has com- 
mon PDF f6(x, y) given above. We will compute Fisher’s Information for 6 in the 
family of PDFs of S(X, Y) = (>> X;/ > ¥;)'/*. Using the PDFs of * X; ~ G(n, 6) 
and 5° ¥; ~ G(n,1/@) and the transformation technique, it is easy to see that 
S(X, Y) has PDF 


2PQn) _ ey 
g6(s) = roe 'G + =) : s>0. 


a log go(s) s 1\fs. 6\7} 
a6 = -2n ats)(3+2) . 


It follows that 


3 4n? S 06 
P| 2 toss0s)] = G7 Ke jis (G+ 3) im 


ce eee arn |- 2 2n 
~ 92 Aaa: +1) 


2n 
Qe: 


< 


That is, the information about 6 in S is smaller than that in the sample. 
The Fisher nformation in the conditional PDF of S given A = a, where 
ACX, Y) = S;(X)S2(Y) can be shown (Problem 12) to equal 


2a K\(2a) 
62 Ko(2a)’ 


where Ko and Kj are Bessel functions of order 0 and 1, respectively. Averaging over 
all values of A, one can show that the information is 27/ 62, which is the total Fisher 
information in the sample of n pairs (x;, y;)’s. 
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PROBLEMS 8.5 


1. 


10. 
11. 


Are the following families of distributions regular in the sense of Fréchet, 
Cramér, and Rao? If so, find the lower bound for the variance of an unbiased 
estimator based on a sample size n. 


(a) fo(x) =07'e-*/® if x > 0, and = 0 otherwise; 6 > 0. 
(b) fo(x) =e" if @ < x < 00, and = 0 otherwise. 

(c) fo(x) = 00 — 0), x =0,1,2,...;50<0 <1. 

(d) f(x; oy= (1/oV2a yen?" /20” —o <x <0" >0. 


. Find the CRK lower bound for the variance of an unbiased estimator of 6, based 


on a sample of size n from the PDF of Problem 1(b). 


. Find the CRK bound for the variance of an unbiased estimator of 9 in sampling 


from NV, 1). 


. In Problem 1 check to see whether there exists a most efficient estimator in each 


case. 


. Let X1, X2,..., X, be a sample from a three-point distribution: 


1-0 1 6 
PA = yb Fix = l= 5: and i  ) Hae 


where 0 < 6 < 1. Does the FCR inequality apply in this case? If so, what is the 
lower bound for the variance of an unbiased estimator of 6? 


. Let X1, X2,... , Xp, be iid RVs with mean y and finite variance. What is the effi- 


ciency of the unbiased (and consistent) estimator [2/n(n + 1)] ey iX; relative 
to X? 


. When does the equality hold in the CRK inequality? 


. Let X;, X2,... , Xn be a sample from N(y, 1), and let d(j) = pe. 
' (a) Show that the minimum variance of any estimator of 2* from the FCR in- 


equality is 4y2/n. 


(b) Show that 7(X,, X2,...,Xn) = x’ — 1/n is the UMVUE of ,22 with 
variance (4.2/n + 2/n?). 


. Let X1, X2,... , Xn be iid G(1, 1/a) RVs. 


(a) Show that the estimator T(X1, X2,... , Xn) = (n— 1)/nX is the UMVUE 
for a with variance a”/(n — 2). 


(b) Show that the minimum variance from FCR inequality is a2/n. 
In Problem 8.4.16, compute the relative efficiency of fig with respect to /21. 


Let X1, X2,..., X, and ¥}, Y2,... , Ym be independent samples from V(x, a?) 
and N (nu, a3), respectively, where jz, 0, ay are unknown. Let p = of /o? and 
@ = m/n, and consider the problem of unbiased estimation of jz. 
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(a) If o is known, show that 
ji9o =aX +(1—a)¥, 


where a = p/(p + @) is the BLUE of 2. Compute var(jig). 
(b) If p is unknown, the unbiased estimator 


PN TAG 
is optimum in the neighborhood of p = 1. Find the variance of jz. 
(c) Compute the efficiency of jz relative to jig. 
(d) Another unbiased estimator of yz is 


_ pFX+0Y 
~ @+90F 


bm 


, 


where F = S3/pS? is an F(m — 1,n — 1) RV. 


12. Show that the Fisher information on 6 based on the PDF 


mea"? |-«(5 +5) | 
KoQa) Tt ONO s 


for fixed a equals (2a/07)[K (2a) /Ko(2a)], where Ko(2a) and K (2a) are 
Bessel functions of order 0 and 1, respectively. 


8.6 SUBSTITUTION PRINCIPLE (METHOD OF MOMENTS) 


One of the simplest and oldest methods of estimation is the substitution principle: 
Let w(@), 6 € © be a parametric function to be estimated on the basis of a random 
sample X;, X2,... , X, froma population DF F’. Suppose that we can write y(@) = 
h(F) for some known function h. Then the substitution principle estimator of ¥(0) 
is h(F*), where F* is the sample distribution function. Accordingly, we estimate 
be = w(F) by u(Fr) = = XK, mp, = Er X* by aE , X;/n, and so on. The method of 
moments is a special case when we need to estimate some known function of a finite 
number of unknown moments. Let us suppose that we are interested in estimating 


(1) 6 = h(n, m2, ... , mg), 


where h is some known numerical function and m; is the jth-order moment of the 
population distribution that is known to exist for 1 < j <k. 
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Definition 1. The method of moments consists in estimating 6 by the statistic 


n n n 
(2) T(X,... SO Cap ep Seas wt Sox). 
1 1 1 


To make sure that T is a statistic, we will assume that h : Ry, — FR is a Borel- 
measurable function. 


Remark J. \t is easy to extend the method to the estimation of joint moments. 
Thus we use n=! 3 X,Y; to estimate E(XY), and so on. 


Remark 2. From the WLLN, n~! o?_, X/ 4 EX. Thus, if one is interested 
in estimating the population moments, the method of moments leads to consistent 
and unbiased estimators. Moreover, the method of moments estimators in this case 
are asymptotically normally distributed (see Section 7.5). 

Again, if one estimates parameters of the type @ defined in (1) and A is a contin- 
uous function, the estimators T(X1, X2,... , X,) defined in (2) are consistent for 0 
(see Problem 1). Under some mild conditions on h, the estimator T is also asymp- 
totically normal (see Cramér [16, pp. 386-387]). 


Example I. Let X1, X2,... , Xn be iid RVs with common mean y and variance 
o*. Theno = ,/m2 — mi, and the method of moments estimator for o is given by 


Although T is consistent and asymptotically normal for o, it is not unbiased. 

In particular, if X;, X2,..., X, are iid P(A) RVs, we know that EX, = A and 
var(X1) = A. The method of moments leads to using either X or De. ¢; _ X)?/ n 
as an estimator of A. To avoid this kind of ambiguity we take the estimator involving 
the lowest-order sample moment. 


Example 2. Let X;, X2,... , Xn be a sample from 


1 


Pin. < b, 
f@)=fb-a pica 
0, otherwise. 
Then 
b = 
Pesos and Wee 
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The method of moments leads to estimating EX by X and var(X) by SCX — 
X)?/n, so that the estimators for a and b, respectively, are 


n ._ ¥\2 

TKisesue MeV ¥ = 321 ~ 0" 
pone n =X? 

T(X1,...,Xn) = ¥+ SRO 1" 


Example 3. Let Xi, X2,..., Xn be iid b(n, p) RVs, where both n and p are 
unknown. The method of moments estimators of p and n are given by 


and | 


X= EX=np 


and 
1 N 
W YX = EX? =np(l — p) +n*p?. 
1 


Solving for n and p, we get the estimator for p as 


x 
T(X1,...,XnN) = ————— 
Te ies En) 
where 79(X1,...,X Nn) is the estimator for n, given by 
(X)? 


T(X1, Xo,..., Xn) = =—y 
¥+X — (yo x?/w) 


Note that xX are np, il X?/N a np(1 — p) +n? p?, so that both T; and 7) are 
consistent estimators. 

Method of moments may lead to absurd estimators. The reader is asked to com- 
pute estimators of 6 in. V(0, 0) or (0, 67) by the method of moments and verify 
this assertion. 


PROBLEMS 8.6 


1. Let X, * a, and ¥Y,, Ee b, where a and b are constants. Leth : R2 > R bea 
continuous function. Show that h(X,, Y,) J h(a, b). 
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2. Let Xy, X2,..., X, be a sample from G(a, f). Find the method of moments 
estimator for (a, 8). 


3. Let X1, X2,... ,Xn be a sample from NV (yu, o*). Find the method of moments 
estimator for (u, 07). 


4. Let X1, X2,... , Xp be a sample from B(a, B). Find the method of moments 
estimator for (a, 8). 


5, Arandom sample of size n is taken from the lognormal PDF 
a ee 1 
f(x; wo) = (ov 20) ly ‘exp [- gators - 0? x >. 


Find the method of moments estimators for jz and 0. 


8.7 MAXIMUM LIKELIHOOD ESTIMATORS 


In this section we study a frequently used method of estimation, namely, the method 
of maximum likelihood estimation. Consider the following example. 


Example I. Let X ~ b(n, p). One observation on X is available, and it is known 
that n is either 2 or 3 and p = 5 or }. Our objective is to estimate the pair (n, p). 
The following table gives the probability that X = x for each possible pair (n, p): 


Maximum 
(2,3) (2,3) 3,3) G4) __ Probability 


& 
27 
2 

gp 
& 


2 


27 
at 
27 


WN = Of|{ * 

O al nie oie 

DS lm Ore Ol 
Oia COI BOIL COf— 
im GHW Wi— WIA 


The last column gives the maximum probability in each row, that is, for each value 
that X assumes. If the value x = 1, say, is observed, it is more probable that it came 
from the distribution b(2, 5) than from any of the other distributions, and so on. The 
following estimator is therefore reasonable in that it maximizes the probability of the 
value observed: 


(2,4) ifx=0, 
(2,5) ifx =1, 
(3,4) ifx =2, 
(3,4) ifx =3. 


(f, p)(x) = 


410 PARAMETRIC POINT ESTIMATION 


The principle of maximum likelihood essentially assumes that the sample is rep- 
resentative of the population and chooses as the estimator that value of the parameter 
which maximizes the PDF (PMF) fo (x). 


Definition 1. Let (X1, X2,..., Xn) be a random vector with PDF (PMF) fo 
(x1,.X2,..- 5%), 9 € ©. The function 


(1) L@6; X1,%2,... Xn) = fo(x1, x2,... Xn), 
considered as a function of @, is called the likelihood function. 


Usually, 9 will be a multiple parameter. If X,,X2,..., Xn are iid with PDF 
(PMF) fo (x), the likelihood function is 


(2) L(G; x1,x2,--- 40) = [| fox). 


i=1 
Let © C Ry and X = (X1, X2,..., Xn). 


Definition 2. The principle of maximum likelihood estimation consists of choos- 
ing as an estimator of 8 a @(X) that maximizes L(0; x1, x2, ... , X,), that is, to find 
a mapping 0 of R,, — R, that satisfies 


(3) L(O; x1, x2,... ,Xn) = sup L(O; x1, x2,-.. 5 Xn). 
6¢@é 


(Constants are not admissible as estimators.) If a 6 satisfying (3) exists, we call it a 
maximum likelihood estimator (MLE). 


It is convenient to work with the logarithm of the likelihood function. Since log is 
a monotone function, 


(4) log L(@; Xj,.-.,%n) = sup log L(@; x1,... , Xn). 
0c@ 


Let © be an open subset of R;, and suppose that f9(x) is a positive, differentiable 
function of @ (that is, the first-order partial derivatives exist in the components of 6). 
If a supremum @ exists. it must satisfy the likelihood equations 


8 log L(@; x1,..-, 
(5) AE he 0, jtd, ik OS Oi wb: 
j 


Any nontrivial root of the likelihood equations (5) is called an MLE in the loose 
sense. A parameter value that provides the absolute maximum of the likelihood func- 
tion is called an MLE in the strict sense or, simply, an MLE. 
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Remark I. If © © R, there may still be many problems. Often, the likelihood 
equation 0 L/00 = 0 has more than one root, or the likelihood function is not dif- 
ferentiable everywhere in ©, or 6 may be a terminal value. Sometimes the likelihood 
equation may be quite complicated and difficult to solve explicitly. In that case one 
may have to resort to some numerical procedure to obtain the estimator. Similar re- 
marks apply to the multiparameter case. 


Example 2. Let X,, X2,... , Xn be a sample from N(y2, o”), where both jz and 
o” are unknown. Here © = {(~1,02),—00 < pp < 00, o” > O}. The likelihood 
function is 


n 


1 (x; — 4)? 
2. = = as A} 
L(p,o*; X1,...,%n) = o*QnyAl oo] ps oe? | : 


and 


n 


Yi Oa — 2)? 
) 2 


logo? _ 5 log(2zr) -_s ses. 


log L(u, o?; x)= 
20 


The likelihood equations are 
and 


Solving the first of these equations for 4, we get 1 = X and, substituting in the 
second, 6? = )~?_,[(X; — X)?/n]. We see that (ji, 62) € © with probability 1. We 
show that (fi, 67) maximizes the likelihood function. First note that X maximizes 
L(u, 07; x) whatever o? is, since L(u, 0; x) > 0 as |u| —> 00, and in that case 
L(ji, 02; x) > 0 as 0? > 0 or 00 whenever 6 € ©, 6 = (fi, 6”). 

Note that 6? is not unbiased for o?. Indeed, EG? = [(n — 1)/n]o?. But n62/(n ~ 
1) = S? is unbiased, as we already know. Also, ji is unbiased, and both fi and G? are 
consistent. In addition, (2 and G? are method of moments estimators for pz and o?, 
and (ji, 67) is jointly sufficient. 

Finally, note that 2 is the MLE of wu if o? is known; but if i is known, the MLE 
of o” is not 6” but D(X — p)?/n. 


Example 3. Let X;, X2,... , X, be a sample from PMF 


1 


=, k= 1, Que Ny 
Py(k)=4N 
0 


otherwise. 
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The likelihood function is 


LIN: ky, ko kn) = VN? 1 < max(kj,...,kn) < N, 
’ ’ Hees 9 RT = 


0, otherwise. 
Clearly, the MLE of N is given by 
N(X1, X2,..., Xn) = max(X, X2,..., Xn), 
for if we take any @ < N as the MLE, then Pa (ki, k2,..- kn) = 0; and if we take 
ae N as the MLE, then Pa(ki, ka, ..- skn) = 1/(B)" < 1/(N)" = ny (ki, ke, 
+ Kn). 


We see that the MLE N is consistent, sufficient, and complete, but not unbiased. 


Example 4. Consider the hypergeometric PMF 
(* N-—M 
xJ\n-—x 
Pu(x) = N : 
n 


0, otherwise. 


max(0,n —N+M) <x < min(n, M), 


To find the MLE N = N(X) of N, consider the ratio 


Rw) = PN@_ Nan N-M 
~ Py-itxy) NN N-~M-n4+x° 


For values of N for which R(N) > 1, Py (x) increases with N, and for values of 
N for which R(N) < 1, Py (x) is a decreasing function of N: 


M 
R(N)>1 _ ifandonly if N < — 
x 


and 


M 
R(N) <1 ifandonly if N > a 


It follows that Py (x) reaches its maximum value where N ~ nM /x. Thus N (X) = 
[nM/X)], where [x] denotes the largest integer < x. 


Example 5. Let X;, X2,... , Xn be a sample from U[6 — 5, + 4]. The likeli- 
hood function is 
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1 if6— 4 <min(x,... . xn) 
L(@; X1,%2,-.-,%n) = < max(x1,...,%) <9 +4, 
0 otherwise. 


Thus L(@; x) attains its maximum provided that 

6— 4 <min(x1,...,%,) and @+ 5 > max(x1,...,Xn), 
or when 

9 <min(x},...,%n) +4 and 6 > max(x},... , Xn) — 5. 
It follows that every statistic T(X , X2,... , Xn) such that 


) max X;— 5 < 7(X1, X2,.-.,Xn) < min Xi +5 
l<i<n 


1<i<n 


is an MLE of @. Indeed, for0 < a < 1, 


Ta(X1,...,Xn) = max X;—4+(1+ min X; — max Xj) 
l<i<n l<i<n l<i<n 
lies in interval (6), and hence for each a, 0 < a < 1, T,(X1,... , Xn) is an MLE 


of 6. In particular, if w = 4, 


min X; + max X; 


T1/2(%1,.-., Xn) = 5 


is an MLE of 6. 


Example 6. Let X ~ b(1, p), p € i. 3}. In this case L(p; x) = p*(1 — p)'>, 
x = 0, 1, and we cannot differentiate L(p; x) to get the MLE of p, since that would 
lead to p = x, a value that does not lie in @ = [j. 31. We have 


P, x=, 
iL - = 
(p; x) toe 2a. 
which is maximized if we choose p(x) = 7 if x = 0, and = ; if x = 1. Thus the 
MLE of p is given by 


2X +1 


P(X) = z 


Note that E, p(X) = (2p + 1)/4, so that p is biased. Also, the mean square error for 
pis 


Ep(p(X) — p)* = & Ep2X+1—4py = 4. 
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In the sense of the MSE, the MLE is worse than the trivial estimator (X) = 4, for 
Ep(3 — p)? = — p)? < 7g for p € LG, Zl. 


Example 7, Let X;, X2,... , Xn be iid b(, p) RVs, and suppose that p € (0, 1). 
If (0,0,... ,0)(C1, 1, ... , 1)) is observed, X = O(X = 1) is the MLE, which is not 
an admissible value of p. Hence an MLE does not exist. 


Example 8 (Oliver [76]). This example itlustrates a distribution for which an 
MLE is necessarily an actual observation, but not necessarily any particular observa- 
tion. Let .X1, X2,... , Xn be a sample from the PDF 


2 
aa 0<x <8, 
aé 

fo(x) = 2a—x 6<x <a, 
aa-—é 
0, otherwise, 


where a > 0 is a (known) constant. The likelihood function is 


2\" Xj a— xj 
L(@; , peters = ES a ’ 
(0; x1, x2 Xn) (=) Hie 


xi<6 xj>0 


where we have assumed that observations are arranged in increasing order of mag- 
nitude, 0 < xj < x2 < --- < X, < a@. Clearly, L is continuous in 6 (even for 
6 = some x;) and differentiable for values of 6 between any two x;’s. Thus, for 
xj <0 < xj41, we have 

j 


IN ; a 
Sipe -iig cee) | : | | ee 
L(@) = (=) O-/(a -@) i (a — x;), 


i= i=j+] 
ai j -j a7 log L j oy 
SNORE se od RT OED ig STS Tag 
a0 6 a-é 062 62 (a —6)2 


It follows that any stationary value that exists must be a minimum, so that there can 
be no maximum in any range x; < @ < xj;+1. Moreover, there can be no maximum 
in0 <6 < x; orx, <6 <a. This follows since for 0 < 6 < x1, 


2 n 3 n 
L@)= (=) (a — 6) []o--) 


is a strictly increasing function of 6. By symmetry, L(@) is a strictly decreasing 
function of 0 in x, < @ < a. We conclude that an MLE has to be one of the 
observations. 

In particular, let « = 5 and n = 3, and suppose that the observations, arranged in 
increasing order of magnitude, are 1, 2, 4. In this case the MLE can be shown to be 
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6= 1, which corresponds to the first-order statistic. If the sample values are 2, 3, 4, 
the third-order statistic is the MLE. 


Example 9. Let X;, X2,..., Xn be a sample from G(r, 1/8); 8 > Oandr > 0 
are both unknown. The likelihood function is 


Oe ee se? 1 
L(p ri X1,X2,.+- Xn) = ror Tit 1% exp (— BYY. pri) Xi > 0, 
0, otherwise. 


Then 


log L(B, r) = nr log B — nlogl'(r) + (r — 1) )logxi — B D> xi, 
i=l i= 


0 log L(B, 7 
i AC) a gee 


ny i 
and 
d log L(B, 
ce Sa eo oc 6 
or 
The first of the likelihood equations yields Ba, X2,...,Xn) = F/X, while the sec- 
ond gives 
r = I(r) 
n log = + 2 lox —n ro) = 
that is, 
r 
logr — iO = 1 og? — > oe, 


which is to be solved for 7. In this case, the likelihood equation is not easily solvable 
and it is necessary to resort to numerical methods, using tables for P(r) / T(r). 


Remark 2. We have seen that MLEs may not be unique, although frequently they 
are. Also, they are not necessarily unbiased even if a unique MLE exists. In terms of 
MSE, an MLE may be worthless. Moreover, MLEs may not even exist. We have also 
seen that MLEs are functions of sufficient statistics. This is a general result, which 
we now prove. 


Theorem 1. Let 7 be a sufficient statistic for the family of PDFs (PMFs) { fo : 
6 € ©}. If a unique MLE of 9 exists, it is a (nonconstant) function of T. If a MLE of 
6 exists but is not unique, one can find a MLE that is a function of T. 
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Proof. Since T is sufficient, we can write 
L(@) = fo(x) = A(x)go(T(X)), 
for all x, all 0, and some h and gg. If a unique MLE 6 exists that maximizes L(@), 
it also maximizes gg(T(x)) and hence @ is a function of T. If a MLE of @ exists but 


is not unique, we choose a particular MLE @ from the set of all MLEs which is a 
function of T. 


Example 10. Let X;, X2,... , Xn be arandom sample from U[@,@6+1], 0 € R. 
Then the likelihood function is given by 


1 n 

L(6;x) = (4) Ho-1<xq<xm<04n1- 

We note that T(X) = (X(q), Xm) is jointly sufficient for @ and any 4 satisfying 
9-1 <x@a) <xm) <O4+1, 
or, equivalently, 
Xm) -1<0<x~a)+1 

maximizes the likelihood and hence is an MLE for 6. Thus, for 0 < a < 1, 

by = (Xn) — 1) + 1 —@)(Xay + D 
is an MLE of 6. If a is a constant independent of: the X’s, then 6 is a function of T. 
Tf, on the other hand, a depends on the X’s, then 6, may not be a function of T alone. 
For example, 

Bq = (sin? X1)(X ny = 1) + (Cos? X1)(Xay + 1) 
is an MLE of 6 but not a function of T alone. 
Theorem 2. Suppose that the regularity conditions of the FCR inequality are sat- 

isfied and 6 belongs to an open interval on the real line. If an estimator 0 of @ attains 
the FCR lower bound for the variance, the likelihood equation has a unique solution 


6 that maximizes the likelihood. 


Proof. If 6 attains the FCR lower bound, we have [see (8.5.8)] 


d log fo(X) 


ce 16x) — 
FY: = [k(@)}" (A(X) — 6] 


with probability 1, and the likelihood equation has a unique solution 9 = 6. 
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Let us write A(@) = [k(9)]~!. Then 


a7 log fo(X) 4, 
aaa) iar A (0)(@ — 6) — A(®), 
so that 
0? log fo(X) 
ary aa ry) ve = —A(6). 


We need only to show that A(@) > 0. 
Recall from (8.5.4) with w(@) = @ that 


Bo {UPC ~ 0) ESO), 


and substituting T(X) — 6 = k(@)[d log fe(X)/80], we get 


2 
k(0) Eo [ree] aa 


That is, 


2 
A@)=E [ee | >0 
00 


and the proof is complete. 


Remark 3. In Theorem 2 we assumed the differentiability of A(@) and the exis- 
tence of the second-order partial derivative 3? log fo/d 07. If the conditions of The- 
orem 2 are satisfied, the most efficient estimator is necessarily the MLE. It does not 
follow, however, that every MLE is most efficient. For example, in sampling from 
a normal population, 6? = )~{(X; — X)?/n is the MLE of o?, but it is not most 
efficient. Since )-(X; — X)?/o? is x?(n — 1), we see that var(G2) = 2(n — 1)o4/n?, 
which is not equal to the FCR lower bound, 204/n. Note that 6? is not even an 
unbiased estimator of o?. 


We next consider an important property of MLEs that is not shared by other meth- 
ods of estimation. Often the parameter of interest is not 6 but some function h(6). If 
6 is the MLE of 6, what is the MLE of h(@)? If 4 = h(@) is a one-to-one function of 
6, the inverse function h~!(A) = @ is well defined and we can write the likelihood 
function as a function of A. We have 


L*(A; x) = L(h!(a); x) 
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so that 


sup L*(A; x) = sup Liha); x) = sup L(6; x). 
a a 6 


It follows that the supremum of L* is achieved at A = h(6). Thus (6) is the MLE 
of h(@). 

In many applications A = A(@) is not one-to-one. It is still tempting to take i= 
n(6) as the MLE of A. The following result provides a justification. 


Theorem 3 (Zehna [121]). Let { fg: 8 € ©} be a family of PDFs (PMFs), and let 
L(@) be the likelihood function. Suppose that © C Ry, k > 1. Leth: © > A bea 
mapping of © onto A, where A is an interval in R,)(1 < p <k). If 6 is an MLE of 


6, then h(6) i is an MLE of h(). 
Proof. Foreach) € A, let us define 
©, = {0:0€0, h(@) =A} 
and 


M(A; x) = sup L(@; x). 
dEOQ) 


Then M defined on A is called the likelihood function induced by A. If 6 is any MLE 


a 


of 8, then 8 belongs to one and only one set, @; say. Since @ € ©3,4 = h(@). Now 


MQ: x) = sup L(@; x) > LO; x) 
OE) 


and A maximizes M, since 


M(A; x) < sup MA; x) = sup L(@; x) = L(@;x), 
0€0) 


so that M(A; x) = Sup, ca M(A; x). It follows that i is an MLE of A(@), where 
A = h(6). 


Example 11. Let X ~ b(1, p),0 < p < 1, and let ne) = var(X) = p(l — p). 
We wish to find the MLE of h(p). Note that A = [0, 4}. The function A is not one- 


to-one. The MLE of p based on a sample of size n is p(Xi,..., Xn) = X. Hence 
the MLE of parameter h(p) is h(X) = X(1 — X). 


Example 12. Consider a random sample from G(1, 8). It is required to find the 
MLE of £ in the following manner. A sample of size n is taken, and it is known 
only that k, 0 < k <n, of these observations are < M, where M is a fixed positive 
number. 
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Let p = P{X; < M} = 1 — e~™/8, so that -M/B = log(1 — p) and B = 


M/\og{1/(1 — p)J. Therefore, the MLE of B is M/log[1/(1 — 6)], where p is the 
MLE of p. To compute the MLE of p we have 


Lp; X1,%2, oon »Xn) aot pk , py *, 
so that the MLE of p is p = k/n. Thus the MLE of £ is 


M 


pistes 
logi[n/(n — k)| 

Finally, we consider some important large-sample properties of MLEs. In the fol- 
lowing we assume that { fg, 9 € ©} is a family of PDFs (PMFs), where @ is an open 
interval on R. The conditions listed below are stated when fg is a PDF. Modifications 
for the case where fg is a PMF are obvious and will be left to the reader. 


(i) 9 log fo /26, 37 log fo/d 67, 3° log fo/d 6° exist for all 9 € © and every x. 


Also, 
© 8 folx) , log fo(X) _ 
ia 96 dx = Eg 96 =0 for all@ € ©. 
2 
Gye 2O Reo meee: 


co ag? 
oo 97 log f(x) 


(iii) -o0 < f°, 5 oD fo(x)dx <0 for all 0. 
(iv) There exists a function H (x) such that for all 6 € 0, 
a? | e 
See) <H(x) and il H(x) fo(x) dx = M(6) < 00. 
—0o 


(v) There exists a function g() that is positive and twice differentiable for every 
6 € © and, a function H(x) such that for all 6 


a2 
ae? 


[ 9 log £) 


00 


< H(x) and [ H (x) fo(x) dx < o. 


Note that the condition (v) is equivalent to condition (iv) with the added qualifi- 
cation that g(@) = 1. 
We state the following results without proof. 


Theorem 4 (Cramér [16]) 


(a) Conditions (i), (iii), and (iv) imply that with probability approaching 1, as 
n —> oo, the likelihood equation has a consistent solution. 
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(b) Conditions (i) through (iv) imply that a consistent solution 6n of the likelihood 
equation is asymptotically normal, that is, 


o'~n (6, -0=) 5Z 


where Z is (0, 1), and 


9 log fo(X) 12) 
ea[a [eer 


On occasions one encounters examples where the conditions of Theorem 4 are not 
satisfied and yet a solution of the likelihood equation is consistent and asymptotically 
normal. 


Example 13 (Kulldorf [55]). Let X ~ N(O, 6), @ > 0. Let Xj, X2,..., Xn be 
n independent observations on X. The solution of the likelihood equation is 0, = 
7.1 X?/n. Also, EX” = 0, var(X*) = 26%, and 


2 
Ey k ee a. 


00 26 
We note that 
6, > 0 
and 
Jn (6, — 6) = V2 = iL —_ 3 NO, 207). 
J2n 6 
However, 


a1 1 3x? 
Wee =~ tae © as@ > 0 


and is not bounded in 0 < 6 < oo. Thus condition (iv) does not hold. 
The following theorem covers such cases also. 
Theorem 5 (Kulldorf [55]) 
(a) Conditions (i), (iii), and (v) imply that with probability approaching 1 asin —> 
oo, the likelihood equation has a solution. 


(b) Conditions (i), (ii), (iii), and (v) imply that a consistent solution of the likeli- 
hood equation is asymptotically normal. 
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For proofs of Theorems 4 and 5 we refer to Cramér [16, p. 500], and Kulldorf [55]. 


Remark 4. It is important to note that the results in Theorems 4 and 5 establish 
the consistency of some root of the likelihood equation but not necessarily that of 
the MLE when the likelihood equation has several roots. Huzurbazar [44] has shown 
that under certain conditions the likelihood equation has at most one consistent so- 
lution and that the likelihood function has a relative maximum for such a solution. 
Since there may be several solutions for which the likelihood function has relative 
maxima, Cramér’s and Huzurbazar’s results still do not imply that a solution of the 
likelihood equation that makes the likelihood function an absolute maximum is nec- 
essarily consistent. 

Wald [114] has shown that under certain conditions the MLE is strongly consis- 
tent. It is important to note that Wald does not make any differentiability assump- 
tions. 

In any event, if the MLE is a unique solution of the likelihood equation, we can 
use Theorems 4 and 5 to conclude that it is consistent and asymptotically normal. 
Note that the asymptotic variance is the same as the lower bound of the FCR in- 
equality. 


Example 14, Consider X1, X2,..., Xn tid P(A) RVs, A € © = (0,00). The 
likelihood equation has a unique solution, Aix, ...3Xn) = X, which maximizes 
the likelihood function. We leave the reader to check that the conditions of Theo- 
rem 4 hold and that MLE X is consistent and asymptotically normal with mean A 
and variance A /n, a result that is immediate otherwise. 


We leave the reader to check that in Example 13, conditions of Theorem 5 are 
satisfied. 


Remark 5. The invariance and the large-sample properties of MLEs permit us to 
find MLEs of parametric functions and their limiting distributions. The delta method 
introduced in Section 7.5 (Theorem 1) comes in handy in these applications. Suppose 
that in Example 13 we wish to estimate (6) = 6°. By invariance of MLEs, the MLE 
of ¥ (0) is ¥On) where 6, = Dix 2 Ini is the MLE of 6. Applying Theorem 7.5.1, 
we see that ¥(,) is AN (0, 864/n). 

In Example 14, suppose that we wish to estimate y(A) = P\(X = 0) = et, 
Then Ww) = e~* is the MLE of (A) and, in view of Theorem 7.5.1, wa) ~ 
AN(e7*, Ae7**/n). 


Remark 6. The uniqueness of MLE does not guarantee its asymptotic normality. 
Consider, for example, a random sample from U (0, 6]. Then Xn) is the unique MLE 


for 0, and in Problem 8.2.5 we asked the reader to show that n(6 — Xn) ca G(1, @). 
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PROBLEMS 8.7 


1. Let X1, Xo,... , Xn be iid RVs with common PMF (PDF) f(x). Find an MLE 
for 6 in each of the following cases: 


(a) fo(x) = 5e-# 91, —c0 < x < 00. 
(b) fa(x) = e749, 0 <x < @. 
(c) fo(x) = @a)x®'e-9", x > 0, and a known. 
(d) fo(x) =0(1—x)9!,0<x<1,0>1. 
2. Find an MLE, if it exists, in each of the following cases: 


(a) X ~ b(n, @): both n and 6 € [0, 1] are unknown, and one observation is 
available. 


(b) Xi, X2,...,Xn ~ DU, 6), 6 € [5, 3]. 
(c) Xy, X2,...,Xn ~N(6, 62), OER. 
(d) X;, X2,..., X, is a sample from 


1-96 


1 6 
P{X=yi}= 5 P{X = y2} = 5, PIX = ys} = 500 <0 <1). 


2 
(e) X1, X2,...,Xn ~N(6, 8), 0< 0 < 00. 
(f) X ~ C(O, 0). 


3. Suppose that 7 observations are taken on an RV X with distribution V(x, 1), 
but instead of recording all the observations, one notes only whether or not the 
observation is less than 0. If {X < 0} occurs m(< n) times, find the MLE of pu. 


4, Let X1, X2,... , Xn be arandom sample from the PDF 
f(x: a, B) = Bole BG), a<x<0o, -wo<a<oco, p>0. 


(a) Find the MLE of (@, 8). 
(b) Find the MLE of Py. g{X1 > 1}. 


5. Let X,, X2,... , Xn be asample from exponential density fo(x) = Oe" x > 
0, 8 > 0. Find the MLE of 0, and show that it is consistent and asymptotically 
normal. 


6. For Problem 8.6.5 find the MLE for (2, 0”). 


7. For a sample of size 1 taken from N(y, o”), show that no MLE of (1, 07) 
exists. 


8. For Problem 5.2.5 suppose that we wish to estimate N on the basis of observa- 
tions X), X2,..., Xm. 
(a) Find the UMVUE of N. 
(b) Find the MLE of N. 
(c) Compare the MSEs of the UMVUE and the MLE. 
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9. Let Xj) = 1,2,...,5; j = 1,2,...,m) be independent RVs where X;; ~ 


10. 


11. 


12. 
13. 


14 


15. 


16. 


17 


N (uj, 02), i = 1,2,..., 5. Find MLEs for 1, “2,..., 4s, and 0”. Show that 
the MLE for o2 is not consistent as 5 > 00 (n fixed). (Neyman and Scott 


[75)) 


Let (X, Y) have a bivariate normal distribution with parameters j1, 42, o;, oe, 
and p. Suppose that n observations are made on the pair (X,Y), and N —n 
observations on X; that is, N — n observations on Y are missing. Find the MLEs 
of 441, 22, o?, ag; and p. (Hint: If f(x, y; “1, 2, of, of, p) is the joint PDF 
of (X, Y), write 


f(x, 3 Mi, M2, 07, 07, p) = files 11,07) frixy | Be, oF(1 — 07), 


where f; is the marginal (normal) PDF of X, and fy)x is the conditional (nor- 
mal) PDF of Y, given x with mean 


02 02 
by = (us = p21) +\p—-* 
0} O1 


and variance of (1 — p*). Maximize the likelihood function first with respect to 
py and a? and then with respect to 42 — p(62/01) 1, P02/01, and oF(1 — p*).] 
(Anderson [1}) 


In Problem 5, let 6 denote the MLE of @. Find the MLE of » = EX, = 1/0 and 
its asymptotic distribution. 


In Problem 1(d), find the asymptotic distribution of the MLE of @. 
In Problem 2(a), find the MLE of d(@) = 9? and its asymptotic distribution. 


Let X1, X2,..., Xn be a random sample from some DF F on the real line. 
Suppose that we observe x1, x2, ... ,X, which are all different. Show that the 
MLE of F is F*, the empirical DF of the sample. 


Let X,, X2,... , Xn be iid N(u, 1). Suppose that © = {yz > 0}. Find the MLE 
of jw. 


Let (X1, X2,..., Xg~-1) have a multinomial distribution with parameters 
N, Pi,--» 5 Pk-i1,0 < py, Pr2,.-- » Pe-1 S Te a pj < 1, where n is known. 
Find the MLE of (p1, p2,... , Pk—-1)- 


Consider the one-parameter exponential density introduced in Section 5.5 in its 
natural form with the PDF 


fo(x) = exp[n7 (x) + Dy) + S(x)]. 
(a) Show that the MGF of T (X) is given by 
M(t) = exp[D(n) — Dy + 1)] 


for t in some neighborhood of the origin. Moreover, E,T(X) = —D’(n), 
and var(T (X)) = —D"(n). 
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(b) If the equation E,, 7(X) = T(x) has a solution, it must be the unique MLE 
of 7. 


18. In Problem 1(b), show that the unique MLE of @ is consistent. Is it asymptoti- 
cally normal? 


8.8 BAYES AND MINIMAX ESTIMATION 


In this section we consider the problem of point estimation in a decision-theoretic 
setting. We consider here Bayes and minimax estimation. 

Let { fo: 6 € ©} be a family of PDFs (PMFs), and X1, X2,... , Xn be a sample 
from this distribution. Once the sample point (x1, x2, ... , Xn) is observed, the statis- 
tician takes an action on the basis of these data. Let us denote by .A the set of all 
actions or decisions open to the statistician. 


Definition 1. A decision function 6 is a statistic that takes values in .A; that is, 6 
is a Borel-measurable function that maps R,, into A. 


If X = x is observed, the statistician takes action §(X) € A. 

Example 1. Let A = {a1, a2}. Then any decision function 5 partitions the space 
of values of (X1,..., Xn), namely, R,, into a set C and its complement C*, such 
that if x € C, we take action ay, and if x € C* action az is taken. This is the problem 
of testing hypotheses, which we discuss in Chapter 9. 


Example 2. Let A = Q. In this case we face the problem of estimation. 


Another element of decision theory is the specification of a loss function, which 
measures the loss incurred when we take a decision. 


Definition 2. Let A be an arbitrary space of actions. A nonnegative function L 
that maps © x A into R is called a loss function. 


The value L(6, a) is the loss to the statistician if he takes action a when @ is the 
true parameter value. If we use the decision function 6(X) and loss function £ and 
6 is the true parameter value, the loss is the RV L(6,5(X)). (As always, we will 
assume that L is a Borel-measurable function.) 


Definition 3. Let D be a class of decision functions that map R,, into A, and let 
L be a loss function on © x A. The function R defined on © x D by 


() R(6, 5) = EgL@, 5(X)) 


is known as the risk function associated with 6 at 0. 


BAYES AND MINIMAX ESTIMATION 425 
Example 3. Let A= © CR, L(@,a) = |@ —a{*. Then 
RO, 8) = EoL.(6, 5(X)) = Eo{5(X) — OY, 


which is just the MSE. If we restrict attention to estimators that are unbiased, the risk 
is just the variance of the estimator. 


The basic problem of decision theory is the following: Given a space of actions A, 
and a loss function L(@, a), find a decision function 6 in D such that the risk R(@, 5) 
is “minimum” in some sense for all 6 € ©. We need first to specify some criterion 
for comparing the decision functions 6. 


Definition 4. The principle of minimax is to choose 5* € D so that 


(2) max R(0,8*) < max R(@, 5) 


for all 5 in D. Such a rule 5", if it exists, is called a minimax (decision) rule. 


If the problem is one of estimation, that is, if A = ©, we call 5* satisfying (2) a 
minimax estimator of @. 


Example 4. Let X ~ b(1,p), pED= {t. 5} and A = {a}, a2}. Let the loss 
function be defined as follows. 


The set of decision rules includes four functions: 51, 52, 53, 54, defined by 5,(0) = 
51 (1) = ay; 52(0) = ay, 6201) = a2; 63(0) = a2, 53(1) = a1; and 54(0) = 54(1) = 
az. The risk function takes the following values: 


i R(pi, 5;) R(p2, 6:) Max R(p, 4;) Min Max R(p, 4;) 
Pi.P2 i pupa 
1 1 3 3 
- 
2 i ; ; 3 
1B 
3 = 3 7 
4 4 2 4 


Thus the minimax solution is 52(+) = a, if x = 0 and = qa if x = 1. 


The computation of minimax estimators is facilitated by the use of the Bayes 
estimation method. So far, we have considered 6 as a fixed constant and g(x) has 
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represented the PDF (PMF) of the RV X. In Bayesian estimation we treat 6 as a 
random variable distributed according to PDF (PMF) 2(@) on ©. Also, 7 is called 
the a priori distribution. Now f(x | 9) represents the conditional probability density 
(or mass) function of RV X, given that @ € © is held fixed. Since z is the distribution 
of 0, it follows that the joint density (PMF) of @ and X is given by 


(3) f(x, 0) = 1) f(x | 6). 
In this framework R(6, 5) is the conditional average loss, E{L(@, 5(X)) | @}, given 
that 6 is held fixed. (Note that we are using the same symbol to denote the RV 6 and 
a value assumed by it.) 

Definition 5. The Bayes risk of a decision function 6 is defined by 
(4) R(x, 5) = Ez R(O, 8). 


If 6 is a continuous RV and X is of the continuous type, then 
(5) R(z, 6) = / R(O, 8)x(0) dé 
= If L(@, 6(x)) f (x | 0) (6) dx dé 


= If L(O, 8(x)) f (x, 0) dx dé. 
If 0 is discrete with PMF x and X is of the discrete type, then 


(6) R(x, 5) = ¥> D> LG, 5(x)) f(x, 9). 
o x 


Similar expressions may be written in the other two cases. 


Definition 6. A decision function 5* is known as a Bayes rule (procedure) if it 
minimizes the Bayes risk, that is, if 


(7) R(x, 8*) = inf R(x, 8). 


Definition 7. The conditional distribution of RV 6, given X = x, is called the 
a posteriori probability distribution of 9, given the sample. 


Let the joint PDF (PMF) be expressed in the form 
(8) f(&, 9) = gh |x), 


where g denotes the joint marginal density (PMF) of X. The a priori PDF (PMF) 
7(0) gives the distribution of 6 before the sample is taken, and the a posteriori PDF 
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(PMF) h(6 | x) gives the distribution of 9 after sampling. In terms of h(@ | x) we 
may write 


(9) R(x, 8) = [ [/ LO, 5(x))h(6 | x) io] dx 

or 

(10) R(x, 5) =) a) » L6, 5(x))h@O | »| ; 
x 6 


depending on whether f and x are both continuous or both discrete. Similar expres- 
sions may be written if only one of f and z is discrete. 


Theorem 1. Consider the problem of estimation of a parameter 0 € © C R with 
respect to the quadratic loss function L(@, 8) = (0 — 5)”. A Bayes solution is given 
by 
(11) d(x) = E{@ | X = x}. 

[5(x) defined by (11) is called the Bayes estimator]. 


Proof. In the continuous case, if x is the prior PDF of 6, then 
Ror,8) = f ge) [19 -seoP ne 19 40} ax, 


where g is the marginal PDF of X, and h is the conditional PDF of @, given x. The 
Bayes rule is a function 5 that minimizes R(z, 5). Minimization of R(z, 5) is the 
same as minimization of 


fue — 5(x) 2 A | x) dé, 
which is minimum if and only if 
5(x) = E{6 | x}. 
The proof for the remaining cases is similar. 
Remark 1. The argument used in Theorem 1 shows that a Bayes estimator is 


one that minimizes E{L(@, 5(X)) | X}. Theorem 1 is a special case which says that 
if L(@, 8(X)) = [6 — 6(X))’, the function 


6(x) = [one | x) dé 


is the Bayes estimator for 6 with respect to 7, the a priori distribution on ©. 
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Remark 2. Suppose that T (X) is sufficient for the parameter 0. Then it is easily 
seen that the posterior distribution of 9 given x depends on x only through T and it 
follows that the Bayes estimator of @ is a function of T. 


Example 5. Let X ~ b(n, p) and L(p, 6(x)) = [p — 35(x)]?. Let x(p) = 1 for 
0 < p < 1 be thea priori PDF of p. Then 


(")p* (1 — py" 


h(p |x) = ——— 
Jo @)p* — p)"-*dp 


It follows that 


1 
Epix) = [ phip |xldp 


_xttil 
oe 


Hence the Bayes estimator is 


X+1 
* = OO 
8*(X) 5" 


The Bayes risk is 


Ro, 8°) = f x(p) 18%) — PP FO | pd dp 
x=0 


he((a-9) r) dp 


[ [np — p) + (1 —2p)"Idp 


== 
Ct teas. 
~ 6(n +2) 


Example 6. Let X ~ N(w, 1), and let the a priori PDF of 2 be A’(0, 1). Also, 
let L(w, 5) = [w — 6(X)}*. Then 


Agi aya LOM 2 FOIE) 
g(x) g(x) 


where 
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f@= [ pw) du 
= tie exp (-} >) [ |-" (2 _ mu") du 
= O40 exp E >> x? + a5 | : 


It follows that 


nes 1 7 51 ( nk ) 
WMS al) ik oe Re 


and the Bayes estimator is 


M nx +i 
aCe) = Blu ix) = = She, 


The Bayes risk is 


R(t, 8°) = / x) | [8"(x) — we F(x |) dxdu 


= 2 
= nX 
- [8 (3 -1) x(w) du 


=f @4+1I)?mtw)r@) du 


00 
_ 1 
ntl 


The quadratic loss function used in Theorem 1 is but one example of a loss func- 
tion in frequent use. Some of many other loss functions that may be used are 


| — 5(X)|? 


é = 5 x : — 


4 

» jJ@—é6CX)}", and ( alti 

Example 7. Let X\, X2,..., Xn be iid N(u, 07) RVs. It is required to find a 
Bayes estimator of yz of the form 5(x1,... ,X,) = 6(x), where X¥ = ay xj /n, using 
the loss function L(y, 5) = | — 6(X)|. From the argument used in the proof of 
Theorem | (or by Remark 1), the Bayes estimator is one that minimizes the integral 
f |~t — 8) |h(u|x) dy. This will be the case if we choose 6 to be the median of the 
conditional distribution (see Problem 3.2.5). 
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Let the a priori distribution of 4 be V6, t2). Since X ~ N (wu, a2/ n), we have 


Vie. ces 7 aa 


FG, w= nor ies 212 22 
Writing 
& —p)? = (F¥-04+0—p) = & — 0)? — 2@ — 6)(u — 0) + (u — 8)’, 


we see that the exponent in f(X, y2) is 


-;|w-0F(5+5)- AS ESD) gh = - 0]. 
T a? 


o2 o2 


It ae i the joint PDF of yz and X is bivariate normal with means 6, 0, vari- 
ances rt”, r2 ie (o*/n), and correlation coefficient t/,/t? + (o2/n). The marginal 
of X is N (6, t2 + (o7/n)), and the conditional distribution of 2, given X, is normal 
with mean 


0(02/n) + xt? 


Oa a aate ay 


t t 
+ a ee 
Jt? + (o2/n) ft? + cim 


and variance 


2 1 =. a = _t707/n 
1? + (o2/n) t? + (o?/n) 


(see the proof of Theorem 5.4.1). The Bayes estimator is therefore the median of this 
conditional distribution, and since the distribution is symmetric about the mean, 


O(07/n) + Xt? 


alae aS TE 


is the Bayes estimator of yu. 
Clearly, 5* is also the Bayes estimator under the quadratic loss function L(t, 5) = 


[u - 6Q0/. 


Key to the derivation of Bayes estimator is the posteriori distribution, A(@ | x). 
The derivation of the posteriori distribution h(@|x), however, is a three-step process: 


1. Find the joint distribution of X and @ given by 1 (6) f (x | 9). 

2. Find the marginal distribution with PDF (PMF) g(x) by integrating (summing) 
over @ € Q. 

3. Divide the joint PDF (PMF) by g(x). 
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It is not always easy to go through these steps in practice. It may not be possible 
to obtain h(6 | x) in a closed form. 


Example 8. Let X ~ N(w, 1) and the prior PDF of jz be given by 


eo #-8) 
m(p) = [id ee 


where @ is a location parameter. Then the joint PDF of X and yu is given by 


eu) 
[1 +e7@-9) 


1 
LOE) 


2 
e BY /2 


so that the marginal PDF of X is 


e ia eC ue 
x= ———_——— dp. 
BT Tie Joo [+ 


A closed form for g is not known. 


To avoid problem of integration such as that in Example 8, statisticians use con- 
jugate prior distributions. Often, there is a natural parameter family of distributions 
such that the posterior distributions also belong to the same family. These priors 
make the computations much easier. 


Definition 8. Let X ~ f(x|0) and 7(@) be the prior distribution on ©. Then 
z is said to be a conjugate prior family if the corresponding posterior distribution 
h(@ | x) belongs to the same family as 7(@). 


Example 9. Consider Example 6, where (2) is N(0, 1) and h(yu | x) is 


nx 1 
Mt (- +17 n+ :) 
so that both h and z belong to the same family. Hence (0, 1) is a conjugate prior 
for p. 


Example 10. Let X ~ b(n, p),0 < p < 1, and 2(p) be the beta PDF with 
parameters (a, 8). Then 


hip is _ prte-lq _ pe} 2 pte _ p)e-! 
Jo p*t-11 — p)P-ldp Be +, B) 


which is also a beta density. Thus the family of beta distributions is a conjugate 
family of priors for p. 
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Conjugate priors are popular because whenever the prior family is parametric, the 
posterior distributions are always computable, h(@|x) being an updated parametric 
version of 7 (6). One no longer needs to go through a computation of g, the marginal 
PDF (PMP) of X. Once f(6|x) is known, g, if needed, is easily determined from 


wo = FOL) 
Be = lx) 


Thus in Example 10, we see easily that g(x) is beta (x + a, B), while in Example 6 
g is given by 


2 nx 


\ 1 272 
eS mrnimaspe|-3 54 "20 7 


i=1 


Conjugate priors are usually associated with a wide class of sampling distribu- 
tions, namely, the exponential family of distributions. 


Natural Conjugate Priors 

Sampling Prior, Posterior, 

PDF(PMF), f (x|@) m(@) h(6|x) 
2 2 222 
2 2 o*w_ txt orev 

N(@, 07) N(u, T°) v( o2 +12 a) 

G(v, B) G(@, B) Ga +v, B +x) 

b(n, p) Bia, B) Bia+x,Bptn—x) 

Pa) G(a, B) G(a+x,B+1) 

NB(r; p) B(a, B) Ba +r, B +x) 

Gy, 1/8) G(a, B) G(a+v,B +x) 


Another easy way is to use a noninformative prior 2 (@), although one needs some 
integration to obtain g(x). 


Definition 9. A PDF 2(@) is said to be a noninformative prior if it contains no 
information about 9; that is, the distribution does not favor any value of 0 over others. 


Example 11. Some simple examples of noninformative priors are 7(@) = 1, 
(0) = 1/0, and (0) = ./1 (8). These may quite often lead to infinite mass and the 
PDF may be improper (that is, does not integrate to 1). 


Calculation of h(@|x) becomes easier bypassing the calculation of g(x) when 
f (x|@) is invariant under a group G of transformations following Fraser’s [30] struc- 
tural theory. 

Let G be a group of Borel-measurable functions on R,, onto itself. The group op- 
eration is composition; that is, if g; and g2 are mappings from Ry, onto Ry, 8281 
is defined by g221(x) = g2(g1(x)). Also, G is closed under composition and in- 
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verse, so that all maps in G are one-to-one. We define the group G of affine linear 
transformations g = {a, b} by 


gx =a+t+ bx, aéR, b>O0. 


The inverse of {a, b} is 


and the composition {a, b} and {c, d} € G is given by 
{a, b}{c, d}(x) = {a, b}(c + dx) =a+b(c+ dx) 
= (a+ bc) + bdx = {a + bc, bd}(x). 
In particular, 


{a, b}{a, b}-! = {a, b} {-2. 3} ai atae: 


Example 12. Let X ~ N (yu, 1) and let G be the group of translations G = 
({b, 1}, —co < b < oo}. Let X1,..., Xp, be a sample from A(z, 1). Then we 
may write 


X;={u,UZ,  i=1,...,n 


where Z1,... , Zn are iid NO, 1). 
It is clear that Z ~ N(0, 1/n) with PDF 


[zo (-32) 


and there is a one-to-one correspondence between values of {Z, 1} and {j, 1} given 
by 


{, 1} = fw, 1}{Z, 1} = {u +z, 1. 


Thus x = 2 + Z with inverse map Z = X — yz. We fix ¥ and consider the variation in 
Z as a function of 4. Changing the PDF element of Z to 1, we get 


Z e[-Su 27] 


as the posterior of 2 given X with prior 2(y2) = 1. 


Example 13. Let X ~ N (0,07) and consider the scale group G = {{0,c}, ¢ > 
0}. Let X1, X2,..., Xn be iid N(0, 02). Write 
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X; = (0, o}Z;, i=1,2,...,n 


where Z; are iid (0, 1) RVs. Then the RV nS? = )77_, Z? ~ x?(n) with the PDF 


1 ns? 2\n/2-1 
PP (n/2) P (-) a 


The values of {0, s,} are in one-to-one correspondence with those of {0, 7} through 


{0, Sx} = {0, a }{0, Sz}, 


where nS? = os x so that s; = os,. Considering the variation in s, as a 


function of o for fixed s,, we see that ds, = s,(do/a7). Changing the PDF element 
of sz toa, we get the PDF of o as 


2)-1 
1 Se ns2 ns? ae 
YT (n/2) P\ 262] \ G2 


which is the same as the posterior of o given s, with prior r(o) = I/o. 


Example 14. Let X;...Xn be a sample from N(u, o”) and consider the affine 
linear group G = {{a, b}, -co < a < 00, b > 0}. Then 


Xi = {p,0}Zj, i=1,...," 


where Z;’s are iid N’(O, 1). We know that the joint distribution of (Z, S?) is given 


by 
—1)/2]-1 
T (2) (n — 152 
Va Pl] Gen) 2 


12 And 
X exp 25 pe] «| oo pe), 


Further, the values of {Z,s,} are in one-to-one correspondence with the values of 
{u, o} through 


{x, 5x} = (uw, o}{Z, 82} = {u +02, 058;} 
eh yee. aiid! eee 
Co oO 


Consider the variation of (Z, s,) as a function of (42, 0) for fixed (, s,). The Jacobian 
of the transformation from {Z, s,} to {u, o} is given by 
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Ll 2H 
Pras 0 of |. Se 
~1 gg -s& | 6 
a2 


Hence, the joint PDF of (4, 7) given (, sx) is given by 


= nw] fw ney 
InP 202 | J —iyf2| 20? 


1)/2j-1 
@—vs2 10 —s2 rr a ~ 82 
os 202 9g? o 


This is the PDF that one obtains if z(4z) = 1 and x(0) = 1/o and pw and o are 
independent RVs. 


The following theorem provides a method for determining minimax estimators. 
Theorem 2. Let { fg: 0 € ©} be a family of PDFs (PMFs), and suppose that an 
estimator 6* of 9 is a Bayes estimator corresponding to an a priori distribution 7 


on ©. If the risk function R(@, 5*) is constant on ©, then 5* is a minimax estimator 
for @. 


Proof. Since 8* is the Bayes estimator of 6 with constant risk r* (free of 6), we 
have 


0° 
r* = R(x, 8*) = / RO, 8*)x(0) dé 
~00 


= int [ R(@, 8)x0) 40 


< SUp inf R(6, 5) < inf sup R(@, 5). 
ec@ ED 5eD 9c@ 


Similarly, since r* = R(@, 5*) for all 9 € ©, we have 
r* = sup R(6, 5*) > inf sup R@, 5). 
660 5€D 9cO 
Together, we then have 


sup R(9, 5*) = inf ies R(@, 5) 
6cO 


which means that 5* is minimax. 


The following examples show how to obtain constant risk estimators and the suit- 
able prior distribution. 
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Example 15 (Hodges and Lehmann [40}). Let X ~ b(n, p), 0 < p < 1. We seek 
a minimax estimator of p of the form aX + B, using the squared-error loss function. 
We have 


R(p, 8) = Ep(@X + B — p)? = Epla(X — np) + B + (an — 1)p}* 
= [(an — 1)? — a*n]p” + [a*n + 2B(an — 1)]p + B?, 


which is a quadratic equation in p. To find w and f such that R(p, 8) is constant for 
all p € ©, we set the coefficients of p? and p equal to 0 to get 


(an —1)?-a’n=0 and a?n+2f(an—1) =0. 


It follows that 
Peete get ae 8 ts a ee 
~ Jn tJ/n) Jn (fn — 1) 
and 
1 1 
= ——— or -———. 
2(1 + Jn) 2(/n — 1) 


Since 0 < p < 1, we discard the second set of roots for both a and f, and then the 
estimator is of the form 


B 


x 1 


Ea 2 


It remains to show that 5* is Bayes against some a priori PDF 7. 
Consider the natural conjugate a priori PDF 


m(p) = [B(o’, B))'p* 1a — p)P!, = O< p<, a',p>0. 


The a posteriori PDF of p, given x, is expressed by 


prte'-l a— pyr ste’! 


h(p|x)= Fa ey Tae 


It follows that 

Bax +a’ +1,n—x +P’) 
Bix ta’,n—x+B’) 

_ x+a’ 

~ nta’+p” 


E{p|x)= 
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which is the Bayes estimator for a squared-error loss. For this to be of the form 6*, 
we must have 
1 _ 1 be 1 _ a’ 
Jn(l+ Jn) n+a'+ pi 21+ Jn) nt+a’+p’ 


giving a’ = ne /n/2. It follows that the estimator 5* (x) is minimax with constant 
risk 


1 
R(p, 6*) = ———= _ forall pe [0,1J. 
(p, 8°) a+ vay orall p € [0, 1] 


Note that the UMVUE (which is also the MLE) is 6(X) = X/n with risk R(p, d) = 
pC — p)/n. Comparing the two risks (Figs. 1 and 2), we see that 


vl+2/n 


pU-p) sl2 
= I Daea/ay 


1 
if and only if 
- = 404 Jn ifandonlyif |p — 
so that 
R(p, 5") < R(p,4) 


in the interval Gj ~— an, 5 + a,), where a, — 0 as n — oo. Moreover, 


sup, R(p, 8) 1/4n _nt2J/n+1 
TO —_—ooo ——————_—_—— > 1 asn > ©. 
sup, R(p, a) 1/4 + Vay] n 
R 
0.25 
1/16 R(p,3*) 
0.5 1 pP 


Fig. 1. Comparison of R(p, 8) and R(p, 4*),n = 1. 
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1/64 A(p,3*) 


Fig. 2. Comparison of R(p, 5) and R(p, 6*),n = 9. 


Clearly, we would prefer the minimax estimator if is small, and would prefer the 
UMVUE because of its simplicity if m is large. 


Example 16 (Hodges and Lehmann [40]). A lot contains N elements, of which D 
are defective. A random sample of size n produces X defectives. We wish to estimate 


D. Clearly, 
D\ (N — D\ (N\"! 
roox= n= (PTE) 


D 2 _ nD(N —n)(N — D) 
EpX adr and on = WRN 1) 


Proceeding as in Example 15, we find a linear function of X with constant risk. 
Indeed, Ep(aX + B ~ D)? = B? when 


= and = 
n+ /n(N —n)/(N — 1) 2 


We show that wX + £ is the Bayes estimator corresponding to the a priori PMF 
1 /N 
P(D=d}=c / ( y Pi — py*~4 2" — py?" dp, 
0 


where a,b > Oandc = T'(a+ b)/T (a) (b). First note that a P(D=d}= 1, 
so that 
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N -) Path) Ta+arNt+b-d _, 


d=0 d]T{a)l(b) T(N +4a+5) 
The Bayes estimator is given by 
iQ) = See MOG DQ + OVW 4b-d 

rt OCD Dra + arw +b - d) 


A little simplification, writing d = (d — a) + a and using 


(r=) = GG): 


yields 
sk) ria (NP (d +a + DEN +b—d) 
= oS eSSSSNSFFSSFSFSFSSSSSSS—e iF 
yi" (Xr (d +a) P(N + b —d) 
_,a4tb+N . a(N—n) 
~a+b+n atbi+n 
Now putting 
_atb+Nn és 5220 ® 
~ a+b+n ~ atb+n 


and solving for a and b, we get 


B pu Noe e 
a~1? a-il — 
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Since a > 0, B > 0, and since b > 0, N > an+8.Moreover,a > lifN >n+1.If 
N =n+1, the result is obtained if we give D a binomial distribution with parameter 


p= 3 If N = n, the result is immediate. 


The following theorem, which is an extension of Theorem 2, is of considerable 


help to prove minimaxity of various estimators. 


Theorem 3. Let {7,.(0); k > 1} be a sequence of prior distributions on © and let 
{5{} be the corresponding sequence of Bayes estimators with Bayes risks R(a;; 87). 


If lim supy_, 49 R(x; 6f) = r* and there exists an estimator 5* for which 


sup R(6, &*) < r*, 
6cO 


then 6* is minimax. 
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Proof. Suppose that 6* is not minimax. Then there exists an estimator 5 such 
that 


sup R(6, 5) < sup R(@, 5*). 
060 Oe 


On the other hand, consider the Bayes estimators {67} corresponding to the priors 
{7r,,(@)}. We obtain 


(12) R(x, fg) = / R(O, 5) 740) dO 
(13) < / R(@, 5)m~ (0) dO 
(14) < sup R(@, 4), 

6cO 


which contradicts supgcg R(@, 5*) < r*. Hence 6* is minimax. 


Example 17. Let X;,... , Xn be a sample of size n from N(y, 1). Then the MLE 
of ys is X with variance | in n. We show that X is minimax. Let u ~ N(0, rt”). Then 
the Bayes estimator of u is X[nt7/(1 + nt*)]. The Bayes risk of this estimator is 


2 
ae = 7 (Se =). 


Now, as t? — 00, R(x, 3*,) > 1/n, which is the risk of X. Hence X is minimax. 


Definition 10. A decision rule 6 is inadmissible if there exists a 5* € D such that 
R(@, 5*) < R(@, 5), where the inequality is strict for some 6 € ©; otherwise, 5 is 
admissible. 


Theorem 4. If X1,..., X, is a sample from N(@, uy: then X is an admissible 
estimator of @ under sua error loss L(@, a) = (@ — a)?. 


Proof. Clearly, X ~ N(@,1/n). Suppose that X is not admissible, then there 
exists another rule 5*(x) such that R(@, 5*) < R(@, X) while the inequality is strict 
for some 6 = Op (say). Now, the risk R(9, 8) is a continuous function of 0 and hence 
there exists an ¢ > 0 such that R(9, *) < R(@, X) — « for |6 — | <e. 

Now consider the prior N (0, 1”). Then the Bayes estimator is 


= i\ 1 nt? 
a as ithrisk — {———~}. 
6(X) x(1 + 4) with ris ; (=) 


Thus 


I 


R(x, X) — R(x, 6,2) = aie 
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However, 


t[R(x, 8*) — R(x, X)] =« [Re. 6*) — RO, Die exp (- 520”) dé 
le 
-550 
Fel eae) 


0 < t[R(z, &*) — Rw, X)] +t [ R(x, X) — RC, 6,2)] 


< £ aa ( : 0°) do + — : 
~ Six Soote PY or? n(l+nt2) 


The right-hand side goes to —2¢”/./2n as t —> oo. This result leads to a contradic- 
tion that 8* is admissible. Hence X is admissible under squared loss. 

Thus we have proved the X is an admissible minimax estimator of the mean of a 
normal distribution (6, 1). 


We get 


PROBLEMS 8.8 


1. It rains quite often in Bowling Green, Ohio. On a rainy day a teacher has es- 
sentially three choices: (1) to take an umbrella and face the possible prospect of 
carrying it around in the sunshine; (2) to leave the umbrella at home and perhaps 
get drenched; or (3) to just give up the lecture and stay athome. Let © = {61, 02}, 
where 6, corresponds to rain, and 62, to no rain. Let A = {a1, a2, a3}, where a; 
corresponds to the choice i, i = 1, 2, 3. Suppose that the following table gives 
the losses for the decision problem: 


The teacher has to make a decision on the basis of a weather report that depends 
on 6 as follows: 


Ww, (rain) 
W) (no rain) 
Find the minimax rule to help the teacher reach a decision. 


2. Let X1, X2,..., X, be a random sample from P(A). For estimating A, using the 
quadratic error loss function, an a priori distribution over ©, given by the PDF 
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m(A) =e ifrA > 0, 


=0 otherwise, 


is used. 
(a) Find the Bayes estimator for A. 


(b) If it is required to estimate (A) = e~* with the same loss function and 
same a priori PDF, find the Bayes estimator for p(A). 


. Let X;, X2,..., X, be a sample from b(1, 6). Consider the class of decision 


rules 6 of the form 8(x1, x2,... , Xn) =n! 7", x; +a, where a is a constant 
to be determined. Find a according to the minimax principle, using the loss 
function (9 — 5)”, where 6 is an estimator for 0. 


. Let 5* be a minimax estimator for ay (8) with respect to the squared-error loss 


function. Show that a6*+5(a, b constants) is a minimax estimator for aw (0)-+b. 


. Let X ~ b(n, 9), and suppose that the a priori PDF of @ is U(0, 1). Find the 


Bayes estimator of 6, using loss function L(@, 6) = (0 — 5)? /{eCi — @)]. Find a 
minimax estimator for 0. 


6. In Example 5, find the Bayes estimator for p*. 


10. 


11. 


. Let X;, X2,..., X, be a random sample from G(1, 1/A). to estimate A, let the 


a priori PDF on A be w(A) = e*, A > O, and let the loss function be squared 
error. Find the Bayes estimator of A. 


. Let X1, X2,..., Xn be iid U(O, 8) RVs. Suppose that the prior distribution of 


6 is a Pareto PDF 7(@) = aa feet! for@ > a,= Oforé@ < a. Using the 
quadratic loss function, find the Bayes estimator of @. 


. Let T be the unique Bayes estimator of @ with respect to the prior density 7. 


Then T is admissible. 


Let X,, X2,..., Xn be iid with PDF fo(x) = exp[—(« — 9)], x > 0. Take 
(0) = e~°, 6 > O. Find the Bayes estimator of @ under quadratic loss. 


For the PDF of Problem 10, consider the estimation of 6 under quadratic loss. 
Consider the class of estimators a (X(1) — 1/n) for all a > 0. Show that X(1) — 
1/n is minimax in this class. 


8.9 PRINCIPLE OF EQUIVARIANCE 


Let P = {Pq: 8 € ©} be a family of distributions of some RV X. Let X C R, be 
the sample space of values of X. In Section 8.8 we saw that the statistical decision 
theory revolves around the following four basic elements: the parameter space ©, the 
action space A, the sample space X, and the loss function L(6, a). 


Let G be a group of transformations that map X onto itself. We say that P is 


invariant under G if for each g € G and every 9 € Q, there is a unique 0’ = 0 € © 
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such that g(X) ~ Pg@ whenever X ~ Pg. Accordingly, 
(1) Po{g(X) € A} = Pge{X € A} 


for all Borel subsets in R,,. We note that the invariance of P under G does not change 
the class of distributions we begin with; it only changes the parameter or index @ to 
29. The group G induces G, a group of transformations g on © onto itself. 


Example 1. Let X ~ b(n, p),0 < p < 1. LetG = {g, e}, where g(x) =n —x 
and e(x) = x. Then gg~! = e. Clearly, g(X) ~ b(n, 1 — p), so that gp = 1 — p and 
ép = e. The group G leaves {b(n, p); 0 < p < 1} invariant. 


Example 2. Let X,, X2,... , Xn be iid N(u, o”) RVs. Consider the group of 
affine transformations G = {{a,b}, a € R, b > O} on X. The joint PDF of 
{a, b}X = (a+ bX),...,a+ bX,) is given by 


n 
fCi8i ee) = a EE 2 ~a- ow? | 
and we see that 
a(u, 0) = (a + wo, bo) = {a, b}{u, o}. 
Clearly, G leaves the family of joint PDFs of X invariant. 


To apply invariance considerations to a decision problem we need also to ensure 
that the loss function is invariant. 


Definition 1. A decision problem is said to be invariant under a group G if 


(i) P is invariant under G, and 


(ii) the loss function L is invariant in the sense that for every g © G anda € A 
there is a unique a’ € .A such that 


L(0,a) = L(g, a’) for all @. 


The a’ € A in Definition 1 is uniquely determined by g and may be denoted by 
&(a). One can show that G = {g : g € G} is a group of transformations of .A into 
itself. 


Example 3. Consider the estimation of 2 in sampling from NV (y, 1). In Example 
8.9.2 we have shown that the normal family is invariant under the location group 
G = {{b, 1}, —00 < b < 00}. Consider the quadratic loss function 


L(u, a) = (u ~ a)’. 
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Then (b, l}a = b +a and {b, I}{y, 1} = {b + pw, 1}. Hence 
L({b, Um, {b, Ja) = L[(b + w) — (b+)? = (u —a)* = Liu, a). 


Thus L(y, a) is invariant under G and the problem of estimation of yz is invariant 
under group G. 


Example 4. Consider the normal family N(0, o”) which is invariant under the 
scale group G = {{0, c}, c > O}. Let the loss function be 


1 
Lo’, a) = (67 -a)’. 
o 
Now {0, c}a = ca and {0, c}{0, 0} = {0, co} and 
1 1 
LUO, clo”, (0, cla] = + (co? — ca)* = (6? — a)” = Lo’, a). 
cro o 


Thus the loss function L(o2, a) is invariant under G = {{0,c},c > 0} and the 
problem of estimation of o? is invariant. 


Example 5. Consider the loss function 
a a 
L(a*,a) = — —1—log = 
(?,a)=5 08 


for the estimation of o? from the normal family A’(0, 02). We show that this loss 
function is invariant under the scale group. Since 


{0, clo? = {0,co*} and {0,c}{0, a} = {0, ca}, 
we have 
L{{0, clo”, (0, cla] = <5 ~1—-log-<S 
co co 
= L(o?, a). 


Let us now return to the problem of estimation of a parametric function ¥ : © > 
R. For convenience let us take © C R and w(@) = 6. Then A = © andG = G. 


Suppose that @ is the mean of PDF fg, G = ({b, 1}, b € R}, and { fo} is invariant 
under G. Consider the estimator 9(X) = X. What we want in an estimator 4* of 0 is 
that it changes in the same prescribed way as the data are changed. In our case, since 
X changes to {b, 1}X = X + b, we would like X to transform to {b, 1}X =X+b. 
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Definition 2. An estimator 6(X) of @ is said to be equivariant, under G, if 

(2) 6(gX) = gd(X) for all g € G, 

where we have written gX for g(X) for convenience. 


Indeed, g on S induces g on ©. Thus if X ~ fo, then gX ~ fo, so if 5(X) 
estimates @ then 5(gX) should estimate g0@. The principle of equivariance requires 
that we restrict attention to equivariant estimators and select the “best” estimator in 
this class in a sense to be described later in this section. 


Example 6. In Example 3, consider the estimators 0;(X) = X, 02(X) = (X qa) + 
X (ny)/2, and 43(X) = @X, o a fixed real number. Then G = {(b, 1), ~00 < b < 00} 
induces G = G on © and both 0), 02 are equivariant under G. The estimator 53 is not 
equivariant unless @ = 1. In Example 1, 8(X) = X/n is an equivariant estimator 
of p. 


In Example 6, consider the statistic 0(X) = S?. Note that under the translation 
group {b, 1}X = X + b and d({b, 1}X) = a(X). That is, for every g € G, 0(gX) = 
a(X). A statistic @ is said to be invariant under a group of transformations G if 
0(gX) = 0(X) for all g € G. When G is the translation group, an invariant statistic 
(function) under G is called location invariant. Similarly, if G is the scale group, we 
call 0 scale invariant, and if G is the location-scale group, we call 0 location-scale 
invariant. In Example 6, 84(X) = S? is location invariant but not equivariant, and 
02(X) and 03(X) are not location invariant. 

A very important property of equivariant estimators is that their risk function is 
constant on orbits of 6. 


Theorem 1. Suppose that 9 is an equivariant estimator of 0 in a problem that is 
invariant under G. Then the risk function of 0 satisfies 


(3) R(g0, 3) = R@, 9) 


for all 9 € © and g ée G. If, in particular, G is transitive over ©, then R(@, 3) is 
independent of @. 


Proof. We have for @ € © and g € G, 


R(O, 9(X)) = EpL(, 3(X)) 
= Eo L(gé, g0(X)) (invariance of L) 
= EgL(g@, (g(X)) (equivariance of 5) 
= Ezo L(26, a(X)) (invariance of { Pg}) 
= R(g6, 8(X)). 
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In the special case when G is transitive over ©, then for any 6;, 02 € © there exists a 
2 € G such that 6) = g@ . It follows that 


R(62, 0) = R(g61, 0) = R(H, 9) 
so that R is independent of 0. 


Remark 1. When the risk function of every equivariant estimator is constant, 
an estimator (in the class equivariant estimators) that is obtained by minimizing the 
constant is called the minimum risk equivariant (MRE) estimator. 


Example 7. Let X;, X2,... , Xn iid RVs with common PDF 
F(x, @) = exp[—( — 6)], x>6 and =0 ifx <0. 
Consider the location group G = {{b, 1}, -oo < b < oo}, which induces G on © 
where G = G. Clearly, G is transitive. Let L(@, 8) = (0 — 9)”. Then the problem of 


estimation of @ is invariant, and according to Theorem 1, the risk of every equivariant 
estimator is free of @. The estimator 59(X) = X 1) — 1/n is equivariant under G since 


1 1 
do({b, 1}X) = pain we +b)--=b4+Xq)—- —- =b+60(%). 
We leave the reader to check that 
R(O, 00) = Eo | X : eel! 
» 00) = 6 (db n re, n2 : 
and it will be seen later that dp is the MRE estimator of 6. 

Example 8. \n this example we consider sampling from a normal PDF. Let us 
first consider estimation of 4 when o = 1. Let G = {{b, 1}, — oo < b < oo}. 
Then 9(X) = X is equivariant under G and it has the smallest risk 1/n. Note that 
{x, 1}~! = {—X, 1} may be used to designate x on its orbits 

{%, Ix = (xy —¥,... Xn —X) = AW). 
Clearly, A(x) is invariant under G and A(X) is ancillary to u. By Basu’s theorem 
A(X) and X are independent. 
Next, consider estimation of o? with uw = 0 and G = {{0,c},c > 0}. Then 


a BP. € 4 is an equivariant estimator of o*. Note that {0, s,}~! may be used to 
designate x on its orbits 


(0, s;}7 x = (=. Mae 2) = A(x). 


x Sx 
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Again, A(x) is invariant under G and A(X) is ancillary to o?. Moreover, S2 and A(X) 
are independent. 

Finally, we consider estimation of (4,02) when G = {{b,c}, -w <b< 
oo, c > 0}. Then (X, 52), where S? = )°7(X; — X)? is an equivariant estimator of 
(u, 07). Also, {, sy} may be used to designate x on its orbits 


{%, 5) x = (A=... ===) = A(x). 


x Sx 


Note that the statistic A(X) defined in each of the three cases considered in Ex- 
ample 8 is constant on its orbits. A statistic A is said to be maximal invariant if 


(i) A is invariant, and 

(ii) A is maximal, that is, A(x}) = A(x2) > x1 = g(x2) for some g € G. 

We now derive an explicit expression for MRE estimator for a location parameter. 
Let X,, X2,... , Xn be iid with common PDF fo(x) = f(x — @), —co < @ < co. 


Then { fg : 8 € ©} is invariant under G = {{b, 1}, —oo < b < oo}, and an estimator 
of 6 is equivariant if 


O({b, 1}X) = a(X) +5 
for all real b. 

Lemma 1. An estimator 4 is equivariant for @ if and only if 
(4) O(X) = Xt + (Xz — X1,.-. , Xn — X1), 
for some function q. 

Proof. If (4) holds, then 


O({b, Ix) = b+ x1 + (x2 — X41, -.- Xn — 1) 


= b+ A(x). 
Conversely, 
O(X) = O(x, +41 — X1,%) +42 — 21,2... XE + Xn — X41) 
= x1 + 0(0, x2 — ¥1,..- Xn — x1), 
which is (4) with g(x2 — x1, .-.,%, — x1) = 000, x2 — X1,... , Xn — x1). 


From Theorem 1 the risk function of an equivariant estimator 9 is constant with 
risk 


R(O, 8) = R(O, 8) = Eo[8(X) |" for all 0, 
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where the expectation is with respect to the PDF fo(x) = f(x). Consequently, 
among all equivariant estimators 3 for 8, the MRE estimator is do, satisfying 


R@, dp) = min R(O, a). 


Thus we only need to choose the function q in (4). 
Let L(6, 8) be the loss function. Invariance considerations require that 


L(0, 8) = L(g0, 28) = L(0 +b,a +b) 


for all real b so that L(6, 0) must be some function w of d — 0. 

Let ¥; = X; — X1,i = 2,...,n, and Y = (¥2,..., ¥,) and g(y) be the joint 
PDF of Y under 6 = 0. Let A(x; |y) be the conditional density, under 0 = 0, of X; 
given Y = y. Then 


(5) ROO, 9) = Eolw(X1 — 9(¥))] 
- / / weer — a(y)ACrily) ax| gty) dy. 


Then R(O, 9) will be minimized by choosing, for each fixed y, g(y) to be that 
value of c that minimizes 


(6) / w(u — c)h(uly) du. 


“ 


Necessarily, g depends on y. In the special case w(d — 0) = (d — 6)’, the integral 
in (6) is minimum when c is chosen to be the mean of the conditional distribution. 
Thus the unique MRE estimator of 0 is given by 


(7) do(x) = x1 — Eo{Xi/¥ = y}. 


This is the Pitman estimator. Let us simplify it a little more by computing Eg{x; — 
X1/¥ = y}. 

First we need to compute h(uly). When 9 = 0, the joint PDF of X;, Y2,..., Yn 
is easily seen to be 


Ff Grr + y2) >>> £1 + Yn), 


so the joint PDF of (Y2,... , Y,) is given by 
lo) 
[tensa ty) Fut yn) du 
—0o 


It follows that 


f@)f(u t+ y2)--- f+ Yn) 


P RON) = ee one ey ATE 
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Now let Z = x; — X1. Then the conditional PDF of Z given y is h(x; — z | y). It 
follows from (8) that 


(9) o(x) = Eo{Zly} - | zh(x; — z) dz 


Pre Mjent fj — Daz | 


Remark 2. Since the joint PDF of X;, X2,... , Xn is Wha fo(xj) = TTj=1 f 
(x; —9), the joint PDF of @ and X when 6 has prior 7 (6) is 7(9) Wes f (x; —@). The 
joint marginal of X is [°° (0) TTj=1 f(x; — 9) 40. It follows that the conditional 
PDF of 6 given X = x is given by 

(6) Trot fj - 9) 
S750) jai Fj — 9) dO 
Taking (0) = 1, the improper uniform prior on ©, we see from (9) that 49(x) is the 


Bayes estimator of 6 under squared-error Joss and prior 7(@) = 1. Since the risk of 
49 is constant, it follows that do is also a minimax estimator of 0. 


Remark 3. Suppose that S is sufficient for 9. Then Mi=1 fo(xj) = go(s)h(x), 
so that the Pitman estimator of 6 can be rewritten as 


Foo 9 Mjnt Soles) 40 

foo Nias fo(xs) 40 

_ Feo 980(s)a(w) dO 
Le, Ba(S)h(x) dO 

= Foo 980(s) 46 

I, 80(s) dO ” 


90(x) = 


which is a function of s alone. 


Examples 7 and 8 (continued). A direct computation using (9) shows that X(1) — 
1/n is the Pitman MRE estimator of 9 in Example 7, and X is the MRE estimator 
of « in Example 8 (when o = 1). The results can be obtained by using sufficiency 
reduction. In Example 7, X(1) is the minimal sufficient statistic for 6. Every (trans- 
Jation) equivariant function based on X(}) must be of the form 0,(X) = Xa) +, 
where c is a real number. Then 


R@, dc) = Eo{Xa)y tc — 6}? 


1 1\}? 
= Ep xy~ 2-64 (c+-)} 
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1\2 1\2 1\? 
n n n 


which is minimized for c = —1/n. In Example 8, X is the minimal sufficient statistic, 
so every equivariant function of X must be of the form 0,(X) = X +c, where c isa 
real constant. Then 


os if 
R(t, Oc) = Ey(X +e — p)? = +c’, 
which is minimized for c = 0. 


Example 9. Let X,, X2,... , Xn be iid U@ — 4,0 + 4). Then (X(1), Xi) is 
jointly sufficient for 6. Clearly, 


1, X11) <9 < Xm), 
0, otherwise, 


ed 


so that the Pitman estimator of @ is given by 


Xin) 
re Suc 9 28. _ Xa) +x 
=~ = : 
ie dé 2 
We now consider, briefly, the Pitman estimator of a scale parameter. Let X have a 
joint PDF 


1 X41 Xn 
fa) = —f (4...) 
where f is known and o > 0 is a scale parameter. The family {f, : 0 > 0} remains 
invariant under G = {{0, c}, c > 0}, which induces G = G on ©. Then for estimation 
of o* loss function L(o, a) is invariant under these transformations if and only if 
L(o,a) = w(a /o* ). An estimator 8 of of is equivariant under G if 


9({0, c}X) = c*a(K) orallc > 0. 


Some simple examples of scale-equivariant estimators of o are the mean deviation 
7 |Xi—X|/n and the standard deviation V 1(X; — X)2/(n — 1). We note that the 
group G over @ is transitive, so according to Theorem 1, the risk of any equivariant 
estimator of o* is free of o and an MRE estimator minimizes this risk over the class 
of all equivariant estimators of o*. Using the loss function L(o, a) = w(a/a*) = 
(a — o*)*/a%*, it can be shown that the MRE estimator of o*, also known as the 
Pitman estimator of o*, is given by 
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fy OP foxes. Uxa) dv 


io) = SS OO: 
i vt t2k-l F(yxy,... , UXn) du 

Just as in the location case, one can show that dg is a function of the minimal 
sufficient statistic and 49 is the Bayes estimator of o* with improper prior z(o) = 
1/a2*+!, Consequently, 3p is minimax. 


Example 8. (continued). In Example 8, the Pitman estimator of o* is easily 
shown to be 


k/2 
— Tint b/21 (4S yo 
= T'[(n + 2k)/2] (> x) ; 


Thus the MRE estimator of o is given by {T'[(n + 1)/2],/ -7 Xx?/ T'{(a#+2)/2]} and 
that of o? by "7 X?/(n + 2). 


Example 10. Let X,, X2,..., Xn be iid U(0, 6). The Pitman estimator of 0 is 
given by 


an(X) = Sxey 8" dv _nt2, 
aa fy..vttidy  n+i 

{n) 

Finally, we consider, briefly, estimation of the mean vector of a multivariate nor- 
mal distribution. Let 0 = (61, 02,..., Op)’ be a column vector and I, be the p x p 
identity matrix. Let X;, X2,... , X, be a sample from a p-variate normal. distribu- 
tion with mean vector @ and variance—covariance matrix Ip. Let L(0,a) = (@ — 
a)(0-a = ar (0; — a;). In the univariate (p = 1) case we have seen that the 
sample mean X is a minimax and admissible estimator of 9. It is therefore natural 
to consider X = (X Ce Xp) as an estimator of @ also in the p-variate case 
and suspect that it has the same properties as in p = 1 case. Certainly, X is a mini- 
max estimator, but is it admissible, too? Stein [108] showed that X is admissible for 
p = 2. But for p > 3, James and Stein [45] showed that the estimator 


p-2\¢ 
(10) (X) ( oak 


improves on X for all 0. 

This is a surprising result but is typical in a variety of multiparameter estimation 
problems. What is optimal in independent estimation problems is not necessarily 
optimal if the problems are considered simultaneously. It should be noted, however, 
that @° does not share the other optimality properties of X. It is not MLE, is biased, 
and is not equivariant. It only dominates X under quadratic loss. 

The estimator 0° takes X and shrinks it toward the origin (provided X’X > p—2). 
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PROBLEMS 8.9 


In all problems assume that X,, X2,... , Xn is a random sample from the distribu- 
tion under consideration. 


1. Show that the following statistics are equivariant under translation group: 
(a) Median (X;). 
(b) (Xqy + Xqy)/2. 
(c) Xjnpj41, the quantile of order p, 0 < p < 1. 
(d) (Xi + Xegy +++ + X@—ry) /( — 2r). 


(e) X + Y, where ¥ is the mean of a sample of size m, m # n. 


2. Show that the following statistics are invariant under location or scale or 
location-scale group: 


(a) X — median(X;). 

(b) X@ti-w — Xw- 

(c) Dy Xi — XI/n. 

De (Xi — XV — ¥) 


[etOG - XP D2 0% - Fy? 
random sample from a bivariate distribution. 


(d) where (X1, Y}),... , (Xn, Yn) is a 


1/2’ 


3. Let the common distribution be G(@, a7), where a (> 0) is known and o > O is 
unknown. Find the MRE estimator for o under loss L(o, a) = (1 — a /o)y. 


4, Let the common PDF be the folded normal distribution 


2 
[Zesp [-36 _ w)*| Ii2,00) (x). 


Verify that the best equivariant estimator of 42 under quadratic loss is given by 


i=-¥- expl—(n/2)(Xay — X)"I 
Vinee fo" (1) Eat) exp(—2/2) az| 
5. Let X ~ U(6, 20). 
(a) Show that (X(1), X(n)) is jointly sufficient statistic for 6. 
(b) Verify whether or not (X(n) — X(1)) is an unbiased estimator of @. Find an 
ancillary statistic. 


(c) Determine the best invariant estimator of 6 under the loss function L(@, a) = 
(1 —a/0)?. 
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6. Let 


fo(x) = 3 exp{—[x — 6]}. 


Find the Pitman estimator of 0. 


7, Let fo(x) = exp[—(@ — @)]- {1 +exp—(@ — 6)]}~2, for x € R,@ € R. Find 
the Pitman estimator of 0. 


8. Show that an estimator ¢ is (location) equivariant if and only if 


9(x) = do(x) + O(x), 


where dp is any equivariant estimator and ¢ is an invariant function. 


9. Let X,, X2 be iid with PDF 
2 x : 
fox) =—- (1 — ~) : O<x<o and = Ootherwise. 
o oO 


Find, explicitly, the Pitman estimator of o'. 
10. Let X1, X2,... , Xp be iid with PDF 


1 x : 
fo(x) = a exp (-3) , x>0 and =0, otherwise. 


Find the Pitman estimator of 6*. 


CHAPTER 9 


Neyman—Pearson Theory of 
Testing of Hypotheses 


9.1 INTRODUCTION 


Let X,, X2,..., X, be a random sample from a population distribution Fo, @ € 
©, where the functional form of Fg is known except perhaps for the parameter 6. 
For example, the X;’s may be a random sample from (6,1), where 6 € FR is 
not known. In many practical problems the experimenter is interested in testing the 
validity of an assertion about the unknown parameter 0. For example, in a coin- 
tossing experiment it is of interest to test, in some sense, whether the (unknown) 
probability of heads p equals a given number po, 0 < po < 1. Similarly, it is 
of interest to check the claim of a car manufacturer about the average mileage per 
gallon of gasoline achieved by a particular model. A problem of this type is usually 
referred to as a problem of testing of hypotheses and is the subject of discussion in 
this chapter. We develop the fundamentals of Neyman—Pearson theory. In Section 9.2 
we introduce the various concepts involved. In Section 9.3 the fundamental Neyman-— 
Pearson lemma is proved, and Sections 9.4 and 9.5 deal with some basic results in 
the testing of composite hypotheses. Section 9.6 deals with locally optimal tests. 


9.2 SOME FUNDAMENTAL NOTIONS OF HYPOTHESES TESTING 


In Chapter 8 we discussed the problem of point estimation in sampling from a pop- 
ulation whose distribution is known except for a finite number of unknown parame- 
ters. Here we consider another important problem in statistical inference, the testing 
of statistical hypotheses. We begin by considering the following examples. 


Example 1. In coin-tossing experiments one frequently assumes that the coin is 
fair, that is, the probability of getting heads or tails is the same: 5. How does one test 
whether the coin is fair (unbiased) or loaded (biased)? If one is guided by intuition, a 
reasonable procedure would be to toss the coin 7 times say, and count the number of 
heads. If the proportion of heads observed does not deviate “too much” from p = i. 
one would tend to conclude that the coin is fair. 


454 
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Example 2. It is usual for manufacturers to make quantitative assertions about 
their products. For example, a manufacturer of 12-volt batteries may claim that a 
certain brand of their batteries lasts for N hours. How does one go about checking 
the truth of this assertion? A reasonable procedure suggests itself: Take a random 
sample of n batteries of the brand in question and note their length of life under 
more or less identical conditions. If the average length of life is “much smaller” than 
N, one would tend to doubt the manufacturer’s claim. 


To fix ideas, let us define formally the concepts involved. As usual, X = (X1, X2, 
..., Xn) and letX ~ Fe, 8 € © C Rx. It will be assumed that the functional form 
of Fg is known except for the parameter @. Also, we assume that © contains at least 
two points. 


Definition 1. A parametric hypothesis is an assertion about the unknown parame- 
ter 9. It is usually referred to as the null hypothesis, Hyp: @ € Go C ©. The statement 
Hy: 0 € O; = © — Op is usually referred to as the alternative hypothesis. 


Usually, the null hypothesis is chosen to correspond to the smaller or simpler sub- 
set Oo of © and is a statement of “no difference,” whereas the alternative represents 
change. 


Definition 2. If ©9(©;) contains only one point, we say that @g(@1) is simple; 
otherwise, composite. Thus, if a hypothesis is simple, the probability distribution of 
X is specified completely under that hypothesis. 


Example 3. Let X ~ N(, 0”). If both yw and o? are unknown, © = {(4, 0”): — 
CO < pL < 00, 0” > O}. The hypothesis Ho: u < 40, a2 > 0, where jx9 is a known 
constant, is a composite null hypothesis. The alternative hypothesis is H,: u > wo, 
o? > 0, which is also composite. Similarly, the null hypothesis zp = 0, 0? > Ois 
composite. 

Ifo? = of is known, the hypothesis Hp: 4 = jug is a simple hypothesis. 


Example 4. Let X,, X2,..., Xn be iid b(1, p) RVs. Some hypotheses of interest 
are p = i, p< 5, p= 4 or, quite generally, p = po, p < po, p = po, where po is 
a known number, 0 < po < I. 


The problem of testing of hypotheses may be described as follows: Given the 
sample point xX = (x1, .x2,... , Xp), find a decision rule (function) that will lead to 
a decision to reject or fail to reject the null hypothesis. In other words, partition the 
sample space into two disjoint sets C and C° such that if x € C, we reject Ho, and if 
x € C°, we fail to reject Ho. In the following we write “accept Ho” when we fail to 
reject Ho. We emphasize that when the sample point x € C* and we fail to reject Ho, 
it does not mean that Ho gets our stamp of approval. It simply means that the sample 
does not have enough evidence against Hp. 
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Definition 3. Let X ~ Fe, 8 € ©. A subset C of R,, such that if x € C, then Hp 
is rejected (with probability 1) is called the critical region (set): 


C= {xe€ R,,: Apo is rejected if x € C}. 
There are two types of errors that can be made if one uses such a procedure. One 


may reject Ho when in fact it is true, called a type I error, or accept Hp when it is 
false, called a type II error: 


True 
Ho A, 
Ao Correct Type II error 
Accept 
A, | Type I error Correct 


If C is the critical region of a rule, PgC, @ € Qo, is a probability of type | error, 
and PgC*, @ € OQ, is a probability of type Ul error. Ideally, one would like to find a 
critical region for which both these probabilities are 0. This will be the case if we can 
find a subset § C R,, such that PeS = 1 for every @ € Oo and PeS = 0 for every 
6 € ©). Unfortunately, situations such as this do not arise in practice, although they 
are conceivable. For example, let X ~ C(1, 6) under Hp and X ~ P(@) under Hj. 
Usually, if a critical region is such that the probability of type I error is 0, it will be 
of the form “do not reject Ho” and the probability of type II error will then be 1. 

The procedure used in practice is to limit the probability of type I error to a pre- 
assigned level w (usually, 0.01 or 0.05) that is small and to minimize the probability 
of type Il error. To restate our problem in terms of this requirement, let us formulate 
these notions. 


Definition 4. Every Borel-measurable mapping g of R, —> [0, 1] is known as a 
test function. 


Some simple examples of test functions are g(x) = 1 for all x € Rn, p(x) = 0 
for all x € Ry, or g(x) = a,0 <a < 1, forall x € Ry. In fact, Definition 4 includes 
Definition 3 in the sense that whenever ¢ is the indicator function of some Borel 
subset A of R,, A is called the critical region (of the test ¢). 


Definition 5. The mapping ¢ is said to be a test of hypothesis Ho: @ € Oo 
against the alternatives H,: @ € ©,, with error probability a (also called level of 
significance or, simply, level) if 


(1) Eey(X) <a _ forall @ € Gp. 


We shall say, in short, that g is a test for the problem (a, @o, ©1). 
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Let us write By(@) = Egy(X). Our objective, in practice, will be to seek a test 
for a given a, 0 < a < 1, such that 


(2) sup By(@) < a. 

Ac@y 
The left-hand side of (2) is usually known as the size of the test ¢. Condition (1) 
therefore restricts attention to tests whose size does not exceed a given level of sig- 
nificance a. 

The following interpretation may be given to all tests g satisfying B,(@) < a for 
all @ € @po. To every x € R, we assign a number ¢(x), 0 < g(x) < 1, which is the 
probability of rejecting Ho that X ~ f9, 8 € Qo, if x is observed. The restriction 
B,(@) < aw for @ € Op then says that if Hp were true, ¢ rejects it with a probability 
< a. We will call such a test a randomized test function. If g(x) = 14(x), g will be 
called a nonrandomized test. If x € A, we reject Ho with probability 1; and ifx ¢ A, 
this probability is 0. Needless to say, A € Bn. 

We next turn our attention to the type II error. 


Definition 6. Let gy be a test function for the problem (a, @o, @1). For every 
6 & O, define 


(3) By (8) = Eay(X) = Pa{reject Ho}. 


As a function of 0, By(@) is called the power function of the test y. For any @ € Qj, 
By(@) is called the power of y against the alternative 0. 


In view of Definitions 5 and 6, the problem of testing of hypotheses may now be 
reformulated. Let X ~ fe, 0 € © C Ry, © = Oo + O}. Also, letO < a < 1 be 
given. Given a sample point x, find a test g(x) such that B,(@) < a for @ € Op, and 
£,(@) is a maximum for @ € ©}. 


Definition 7. Let ©, be the class of all tests for the problem (a, po, @1). A test 
go € Py is said to be a most powerful (MP) test against an alternative 0 € ©; if 


(4) Bon (8) = By(O) for all g € Dy. 


If ©; contains only one point, this definition suffices. If, on the other hand, ©, 
contains at least two points, as will usually be the case, we will have an MP test 
corresponding to each @ € ©). 


Definition 8. A test gg € ®, for the problem (a, Qo, ©1) is said to be a uni- 
formly most powerful (UMP) test if 


(5) Boy (8) = Bo(O) forallg € ®,, uniformly in 8 € ©;. 


Thus, if Op and ©; are both composite, the problem is to find a UMP test ¢ for 
the problem (a, @9, ©;). We will see that UMP tests very frequently do not exist, 
and we will have to place further restrictions on the class of all tests, By. 
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Note that, if 1, @2 are two tests and A is a real number, 0 < A < 1, then Ag, + 
(1 — A)¢2 is also a test function, and it follows that the class of all test functions Dy 
is convex. 


Example 5. Let X,, X2,... , Xp be iid Nz, 1) RVs, where yw is unknown but it 
is known that uw € © = {yo, 41}, wo < wt. Let Ho: Xi; ~ N(uo, I), A: Xi ~ 
N (1, 1). Both Hg and H, are simple hypotheses. Intuitively, one would accept Ho if 
the sample mean X is “closer” to zo than to 21; that is, one would reject Ho if X > k, 
and accept Ho otherwise. The constant k is determined from the level requirements. 
Note that under Ho, X ~ N (10, 1/n), and under Hi, X ~ N (1, 1/n). Given 
0 <a < 1, we have 


ie X= k— 
rik > n= P| ee | 


I/Jn ~ A/a/n 
= P{type I error} = a, 


so that k = 2 + 2q/./n. The test, therefore, is (Fig. 1) 


ari Za 
i, if ¥ > uo + —=, 
9X) = vn 


0, otherwise. 


Here X is known as a test statistic, and the test g is nonrandomized with critical 

region C = {x: ¥ > uo + Za/./n}. Note that in this case the continuity of X (that is, 

the absolute continuity of the DF of X) allows us to achieve any sizea,0 <a <1. 
The power of the test at jz; is given by 


— va 
Ey, 9(X) = Pu, {x > pot | 


x- 
->| ifn Ht > Go —m/A-+zl 


= P{Z > zq— Vn (uy — Ko)}, 


ccept Hy eject Hy 


Bo Hot z,/n x 


Fig. 1. Rejection region of Ho in Example 5. 
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where Z ~ AV’(0, 1). In particular, E,,, p(X) > @ since 4) > fo. The probability of 
type I error is given by 


P {type WI error} = 1 — Ey, g(X) 
= P{Z < zq — Jn (ur — Mo)}- 


Figure 2 gives a graph of the power function B,(j) of y for z > 0 when po = 0, 
and H,: up > 0. 


Example 6. Let X1, X2, X3, X4, Xs, be a sample from b(1, p), where p is un- 
known and 0 < p < 1. Consider the simple null hypothesis Ho: X; ~ b(1, 4). that 
is, under Ho, p = } Then H,: X; ~ b(1, p), p # 7 A reasonable procedure would 
be to compute the average number of 1’s, namely, X = SP X;/5, and to accept Ho 
ifix = a < c, where c is to be determined. Let a = 0.10. Then we would like to 
choose c such that the size of our test is a, that is, 


0.10 = Pp=1/2 


¥-5]>ef, 


or 


0.5 


0.05 |------------------------- 
15 


Ob New ee ne ee er ee re ne en ee ee nee 


1.5 


Fig. 2. Power function of g in Example 5. 
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5 
(6) 0.90 = Ppa1/2 625 Kp= 22 se 
i 


where k = 5c. Now X; ~ b6, 5) under Ao, so that the PMF of ar Xi - 3 is 
given in the following table: 


5 5 5 5 5 
Xi Xp Pp X; = i 
2 272 maf e= Ds 

0 ~—2.5 0.03125 

1 —1.5 0.15625 

2 —0.5 0.31250 

3 0.5 0.31250 

4 1.5 0.15625 

5 2.5 0.03125 


Note that we cannot choose any k to satisfy (6) exactly. It is clear that we have to 
reject Hy when k = +2.5, that is, when we observe )- X; = Oor 5. The resulting size 
if we use this test is a = 0.03125 + 0.03125 = 0.0625 < 0.10. A second procedure 
would be to reject Ho if k = +£1.5 or 42.5 ()- X; = 0, 1, 4, 5), in which case the 
resulting size is a = 0.0625 +2(0.15625) = 0.375, which is considerably larger than 
0.10. If we insist on achieving a = 0.10, a third alternative is to randomize on the 
boundary. Instead of accepting or rejecting Ho with probability 1 when }~ X; = 1 or 
4, we reject Ho with probability y where 


5 5 
0.10 = Pp=1/2 1» Xi= oors| + y Pp=i/2 \y>x =1 oa] ; 
1 1 


Thus 


0.0375 


= —— = 0.114 
0.3125 , 


Y 


A randomized test of size a = 0.10 is therefore given by 


5 
1 if >a = 00r5, 
1 


5 
x)= 
v(x) 0.114 ~~ if > x = lord, 
1 


0 otherwise. 
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0.5 


0 0.5 1 15 


Fig. 3. Power function of g in Example 6. 


The power of this test is 


5 5 
Epg(X) = Pp {>> i= oors| +0.114P, {» Xj=1 oa] 
1 1 


where p # 4 and can be computed for any value of p. Figure 3 gives a graph of 


Bop). 
We conclude this section with the following remarks. 


Remark 1. The problem of testing of hypotheses may be considered as a special 
case of the general decision problem described in Section 8.8. Let A = {ao, ay}, 
where ao represents the decision to accept Hp: @ € Oo, and a, represents the deci- 
sion to reject Ho. A decision function 5 is a mapping of R,, into A. Let us introduce 
the following loss functions: 


1 if 0E Oo 
L,(4, = d L\(@, = 0 for all @, 
1(9, ay) if 0c 0, an 1(@, ag) ora 
and 
0 if 9E Oo 
L2(0, = d L2(0, = 0 for all 6. 
2(8, ag) ! if 0c0, an 2(0, ay) ora 


Then the minimization of EgL2(@,5(X)) subject to Egl,(0,5(X)) < a@ is the 
hypothesis-testing problem discussed above. We have 
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E@Ll2(8, 5(X)) = Pe{d(X) = ao}, GeO}, 
= Pa{accept Ho | Hj true}, 


and 


E9L\(0, 5(X)) = Pe{d(X) = ai}, 0 € Oo, 
= Po{reject Ho | 8 € Oo true}. 


Remark 2. In Example 6 we saw that the size a chosen is often unattainable. 
The choice of a specific value of a is completely arbitrary and is determined by non- 
statistical considerations such as the possible consequences of rejecting Ho falsely 
and the economic and practical implications of the decision to reject Ho. An alterna- 
tive and somewhat subjective approach wherever possible is to report the P-value of 
the test statistic observed. This is the smallest level a at which the sample statistic 
observed is significant. In Example 6, let S = 4 X;. If S = 0 is observed, then 
Px (S = 0) = Po(S = 0) = 0.03125. By symmetry, if we reject Ho for S = 0, we 
should also do so for S = 5, so the probability of interest is Po(S = 0 or 5) = .0625, 
which is the P-value. If 5 = 1 is observed and we decide to reject Hp, we would 
also do so for S = 0 because S = 0 is more extreme than S = 1. By symmetry 
considerations, 


P-value = Po(S < 1 or S > 4) = 2(0.03125 + 0.15625) = 0.375. 


This discussion motivates Definition 9 below. Suppose that the appropriate critical 
region for testing Ho against H is one-sided. That is, suppose that C is either of the 
form {T > c,} or {T < cz}, where T is the test statistic. 


Definition 9. The probability of observing under Ho a sample outcome at least 
as extreme as the one observed is called the P-value. The smaller the P-value, the 
more extreme the outcome and the stronger the evidence against Ho. 


If w is given, we reject Ho if P < @ and do not reject Ho if P > a. In the two- 
sided case when the critical region is of the form C = {|T (X)| > k}, the one-sided P- 
value is doubled to obtain the P-value. If the distribution of T is not symmetric, the 
P-value is not well defined in the two-sided case, although many authors recommend 
doubling the one-sided P-value. 


PROBLEMS 9.2 


1. A sample of size 1 is taken from a population distribution P(A). To test Hp: A = 
1 against H,: 4 = 2, consider the nonrandomized test g(x) = 1 if x > 3, and 
= Oif x < 3. Find the probabilities of type I and type II errors and the power 
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of the test against 4 = 2. If it is required to achieve a size equal to 0.05, how 
should one modify the test p? 


. Let X;, X2,..., X, be a sample from a population with finite mean y and finite 


variance o*. Suppose that jz is not known but o is known, and it is required to 
test ~ = uo against uw = 2; (44 > Lg). Let n be sufficiently large so that the 
central limit theorem holds, and consider the test 


Gk ye 1 if >k, 
Q 1, Zoeres n = 0 if ¥ <k, 
where x = n™! >-y-1 Xi. Find k such that the test has (approximately) size a. 
What is the power of this test at 4 = 4;? If the probabilities of type I and type I 
errors are fixed at a and B, respectively, find the smallest sample size needed. 


3. In Problem 2, if o is not known, find k such that the test g has size a. 


4, Let X1, X2,..., Xn be a sample from N(y, 1). For testing u < fo against 
[4 > wo, consider the test function 

res Za 

1 if x > +—, 

Bea 
9(X1,%2,--- Xn) = aa 

0 if ¥ < wo+ —. 

ae 


7. 


Show that the power function of g is a nondecreasing function of j. What is the 
size of the test? 


. A sample of size 1 is taken from an exponential PDF with parameter @, that is, 


X ~ G(, 6). To test Ho: 0 = 1 against H;: 0 > 1, the test to be used is the 
nonrandomized test 


1 if x > 2, 


Ler F if x <2. 


Find the size of the test. What is the power function? 


Let X;, X2,..., Xp be a sample from N’(0, 7). To test Ho: o = oo against 
H, =o € 99, it is suggested that the test 


1 if x? > cor Dx? <c9, 


X1,X2,...,X,) = : 
p(x1, x2 n) ( fee ee, 


be used. How will you find c, and c2 such that the size of ¢ is a preassigned 
number a, 0 < a@ < 1? What is the power function of this test? 


An urn contains 10 marbles, of which M are white and 10— M are black. To test 
that M = 5 against the alternative hypothesis that M = 6, one draws 3 marbles 
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from the urn without replacement. The null hypothesis is rejected if the sample 
contains 2 or 3 white marbles; otherwise, it is accepted. Find the size of the test 
and its power. 


9.3 NEYMAN-PEARSON LEMMA 


In this section we prove the fundamental lemma due to Neyman and Pearson [74], 
which gives a general method for finding a best (most powerful) test of a simple 
hypothesis against a simple alternative. Let {fg,9 € ©}, where © = {6p, 6)}, be 
a family of possible distributions of X. Also, fg represents the PDF of X if X is a 
continuous RV, and the PMF of X if X is of the discrete type. Let us write fo(x) = 
Fog (x) and fi (x) = fo, (x) for convenience. 


Theorem 1 (Neyman—Pearson Fundamental! Lemma) 


(a) Any test g of the form 


! if fitx) > k fo), 
(1) g(x) =j}y@%) ~~ if fi) =k fox), 

0 if fi(x) <k fo(x), 
for some k > 0 and 0 < y(x) < J, is most powerful of its size for testing 
Ho: 0 = 6 against H,: 6 = 6;. If k = on, the test 

1 if fo(x) = 0, 

2 = 
@) v(x) . if fo(x) > 0, 


is most powerful of size 0 for testing Ho against H). 
(b) Given a, 0 < @ < 1, there exists a test of form (1) or (2) with y(x) = y (a 
constant) for which Eg, p(X) = a. 


Proof. Let g be a test satisfying (1) and g* be any test with Egy*(X) < 
Ee, p(X). In the continuous case 


[io — 9 (x)ILAi&) —k fox] dx 


-( | + / ) tors — "COLA ~k foxw)] dx 
fi>kfo Si<kfo 


For any x € { f(x) > kfo(x)}, g(x) — g*(x) = 1 — g*(x) = 0, so that the integrand 
is > 0. For x € (fi (x) < Kfo(x)}, o(x) — y*(x) = —*(x) < 0, so that the integrand 
is again > 0. It follows that 
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[iw — g* (@ILAC) — k fo(x)dx 
= Eo, (X%) — Eo, p*(X) — k(Eayp(X) — Eayy*(X)) 2 0, 
which implies that 
Eo, 9(X%) — Eo, 9*(%) > k(Eay(X) — Eagy*(X)) = 0 


since Eq,y*(X) < Eay(X). 
If k = 00, any test y* of size 0 must vanish on the set { fo(x) > 0}. We have 


Eo,9(X) — E4,o"(X) = i [1 — y* O01 fi ax > 0. 
{ fo(x)=0} 


The proof for the discrete case requires the usual change of integral by a sum 
throughout. 

To prove (b) we need to restrict ourselves to the case where 0 < @ < 1, since the 
MP size 0 test is given by (2). Let y(x) = y, and let us compute the size of a test of 
form (1). We have 


Eaye(&) = Pog fiCX) > kfoOO) + v Pal AiO = kfo(X) 
= 1— Pal fiCX < kfo(X)} + y Pol fiX) = kfo(X)}. 


Since Po, { fo(X) = 0} = 0, we may rewrite Eg,g(X) as 


2) eat 1m <tr Ea] 
Given 0 < a < 1, we wish to find k and y such that Eg,g(X) = a, that is, 
© aloo 41-7? Yon <4 =I * 
Note that 

A) <4] 

fo(X) ~ 


is a DF so that it is a nondecreasing and right continuous function of k. If there exists 
a kg such that 


FiCX) = 
» {op ¢ to} = ” 


we choose y = 0 and k = kp. Otherwise, there exists a kg such that 
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xv 


(5) Pr {oe 


AD <to} st-a< Pe {a> ta} 


< ko 
fo) 
that is, there is a jump at ko (see Fig. 1). In this case we choose k = ko and 


— Pal fiCO/fo® < ko} —  — @) 


(6) 
Pot f1 X)/foCX) = ko} 


Since y given by (6) satisfies (4), and 0 < y < 1, the proof is complete. 
Remark I. It is possible to show (see Problem 6) that the test given by (1) or (2) 
is unique (except on a null set), that is, if g is an MP test of size a of Ho against Ay, 


it must have form (1) or (2), except perhaps for a set A with Po,(A) = Po,(A) = 


Remark 2. An analysis of proof of part (a) of Theorem 1 shows that test (1) is 
MP even if f; and fo are not necessarily densities. 


Theorem 2. If a sufficient statistic T exists for the family {fg: 6 € O},©O = 
{99, 01}, the Neyman—Pearson MP test is a function of T. 


The proof of this result is left as an exercise. 


Remark 3. If the family { fo: @ € ©} admits a sufficient statistic, one can restrict 
attention to tests based on the sufficient statistic, that is, to tests that are functions of 
the sufficient statistic. If g is a test function and T is a sufficient statistic, E{p(X) | 
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T} is itself a test function, 0 < E{y(X) | T} < 1, and 

EetE{p(X) | T)} = Eog(X), 
so that g and E{g | T} have the same power function. 


Example 1. Let X be an RV with PMF under Ho and H, given by 


x 1 2 3 4 5 6 


fo(x) 0.01 O01 O01 O01 0.01 0.95 
fiz) 0.05 0.04 0.03 0.02 0.01 0.85 


Then A(x) = fi(x)/fo(x) is given by 


x {1 2 3 4 5 =6 
A@) 15 4 3 2 1 0.89 


If a 


== 0.03, for example, then Neyman—Pearson MP size 0.03 test rejects Ho if 
A(X) = 3, 


that is, if X < 3 and has power 
P(X < 3) = 0.05 + 0.04 + 0.03 = 0.12 
with P(type II error) = 1 — 0.12 = 0.88. 


Example 2. Let X ~ N(0, 1) under Hp and X ~ C(1,0) under H. To find an 
MP size a test of Ho against MH, 


fi) _ /m 0/1 + x°)) 
fox) = (A/V 2m)e-*?/2 
Z ev /2 
Van ltx 

Figure 2 gives a graph of A(x) and we note that 1 has a maximum at x = 0 and 
two minima at x = +1. Note that A(0) = 0.7979 and A(+1) = 0.6578, so for 
k € (0.6578, 0.7989), A(x) = k intersects the graph at four points and the critical 
region is of the form |X| < k, or |X| > k2, where ky and kz are solutions of A(x) = k. 
For k = 0.7979, the critical region is of the form |X| > ko, where kg is the positive 
solution of e~ 5/2 1+ k. so that kp © 1.59 with a = 0.1118. Fork < 0.6578, 


a = 1, and for k = 0.6578, the critical region is |X| > 1 with a = 0.3413. For the 
traditional level a = 0.05, the critical region is of the form |X| > 1.96. 


A(x) = 
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~\----- 2-2 eeeeeerpepcc tcc eee esy - (0) = 0.7979 


ee Oe ee een d(1) = 0.6578 


0 k; 1 x 


Fig. 2. Graph of A(x) = (2/m)'/?[exp(x?/2)/(1 + x?)]. 


Example 3. Let X\, X2,..., Xn be iid b(1, p) RVs, and let Hp: p = po, 
Hi: p = pi, pi > po. The MP size a@ test of Hp against H, is of the form 


xi oss n— Sx; 
1 Aw®= a > k, 
pa (1 = poy "2 
¥Y, AM) =kh, 


0, A(k) < k, 


9(x1, x2, puene > Xn) = 


where k and y are determined from 


E poy (X) = a. 


Es 2 n> x; 
ee (2) =") 
Po 1— po 


and since p; > po, A(x) is an increasing function of )~ x;. It follows that A(x) > k 
if and only if 5° x; > k,, where k, is a constant. Thus the MP size @ test is of the 
form 


Now 


1 if Sox > ki, 
g(x) = 4 if xi =k, 
0 otherwise. 
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Also, k; and y are determined from 


a = Ep o(X) = Ppy {yx > af +y Popo {sox =| 
t i 
. ce n—-r n\ k ee 
= > (")e6c — Po) +r(f Joba — po), 


r=ky+1 


Note that the MP size @ test is independent of p; as long as py > po; that is, it 
remains an MP size @ test against any p > po and is therefore a UMP test of p = po 
against p > po. 

In particular, letn = 5, po = 5 Y= 3, and a = 0.05. Then the MP test is given 
by 


1, dix > k, 
p(x) = ty, Six: =k, 
0, Six; <k, 


where k and y are determined from 


S/\ {15 AW AG 
ea (") (3) - r() (3) 
It follows that k = 4 and y = 0.122. Thus the MP size w = 0.05 test is to reject 
= 4 in favor of p = 3 if 0] X; = 5 and reject p = } with probability 0.122 if 
“i X; = 4. 
It is simply a matter of reversing inequalities to see that the MP size a test of 
Ho: p = po against H): p = p (pi < po) is given by 


1 if )’xj <k, 
e@x)=jfy if Vx =k, 
0 if Dox >k, 


where y and k are determined from E,,y(X) = a. 
We note that T(X) = }~ X; is minimal sufficient for p, so that in view of Remark 
3, we could have considered tests based only on T. Since T ~ b(n, p), 


(Ol rai p\)"* t n-t 
Nye (2) (2) 


fot) fn i i- 
0 (P2601 ~ poy t Po Po 


so that an MP test is of the same form as above but the computation is somewhat 
simpler. 
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We remark that in both cases (p1 > po, P1 < Po) the MP test is quite intuitive. 
We would tend to accept the larger probability if a larger number of “successes” 
showed up, and the smaller probability if a smaller number of “successes” were 
observed. See, however, Example 2. 


Example 4. Let X;, X2,..., Xn be iid N(u, a”) RVs hess both yz and o? are 


unknown. We wish to test x hypothesis Ho: U = Yo, 07 = og against the 
alternative Hj: uw = i,0° = Op. The fundamental lemma leads to the following 
MP test: 


wall ifr@>e, 
Oe lo. Cif A(x) <k, 


where 


(1/ooV/2z)" exp{—-[}- (xi — 41)?/204)} 


A(x) = , 
(1/ooV/ 220)" exp{-[-(xi — Ho)? /208)} 


and k is determined from Eyo,o)9(X) = a. We have 


2 2 
Mi BO Lo wy 
A(x) = exp Dp (4 =_ +s) +n (4 — ay) ‘ 


If 4; > po, then 


n 
A(x) >k ifandonlyif ) > xi >, 


i=! 


where k’ is determined from 


. 2D Xi — "Ko , Ka nuo 
, i 
o= Fina {Joxi>¥| =p Em > Karel, 


i=l 


giving k’ = zy./n oo + nyo. The case 1 < fo is treated similarly. If oo is known, 
the test determined above is independent of 2; as long as 41 > po, and it follows 
that the test is UMP against Hy; :p> 1, es oe: If, however, 0} is not known, that 
is, the null hypothesis is a composite nyBotens Hy: & = Ho, a? > 0 to be tested 
against the alternatives Hy: w= 1, o2 > O (441 > Uo), the MP test eau aas 
above depends on a. a other _— an Be test against the alternative ,11, of will 
not be MP against 1, of , where oa} x Ge. 
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PROBLEMS 9.3 


1. 


10. 


11. 


A sample of size 1 is taken from PDF 


fo(x) 0-2) ifO0<x <8, 
2] = 


otherwise. 


Find an MP test of Ho: 9 = 0 against Hy: 0 (0; < 9). 


. Find the Neyman—Pearson size a test of Ho: 6 = Oo against H,: 6 = 0, (01 < 


99), based on a sample of size 1 from the PDF 
fo(x) = 20x +2(1 —6)(1 — x), O<x<1, @e€f[0,1]. 


Find the Neyman—Pearson size a test of Hp: 6 = 1 against H,: B = p, (> 1), 
based on a sample of size 1 from 


pxP-!, O<x <1, 

x; B) = . 
FO: B) 0, otherwise. 

Find an MP size @ test of Hp: X ~ fo(x), where fo(x) = (21)~V2e-27/2, 

—0o <x < 00, against Hy: X ~ f(x), where fi(x) = 27'e#!, -00 <x < 

oo, based on a sample of size 1. 


. For the PDF fo(x) = e~@~®, x > 0, find an MP size a test of 9 = 6p against 


6 = 0 (> @), based on a sample of size n. 


. If y* is an MP size a test of Ho: X ~ fo(x) against H,: X ~ f)(x), show that it 


has to be either of form (1) or form (2) (except for a set of x that has probability 0 
under Ho and H);). 


- Let g* be an MP size a (0 < @ < 1) test of Ho against Hj}, and let k(a) denote 


the value of k in (1). Show that if a; < a, then k(a2) < k(a1). 


. For the family of Neyman—Pearson tests, show that the larger the a, the smaller 


the 6 (= P{type I error}). 


Let 1 — B be the power of an MP size @ test, where 0 < a < 1. Show that 
a <1— unless Po, = Po,. 


Let a be a real number, 0 < a < 1, and y* be an MP size a test of Ho against 
Ay. Also, let 8 = Ey, 9*(X) < 1. Show that 1 — g* is an MP test for testing Hy 
against Ho at level 1 — B. 


Let X,, X2,... , Xn be arandom sample from the PDF 
0 ; 
foxyaza if0<@<x <oo. 
x 


Find an MP test of 6 = 00 against @ = 0; (#4 6p). 
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12. Let X be an observation in (0, 1). Find an MP size a test of Hp: X ~ f(x) = 4x 
if0 <x < },and=4-4x if 5 <x < 1, against Hj: X ~ f(x) = 1if 
0 <x < 1. Find the power of your test. 


13. In each of the following cases of simple versus simple hypotheses Hp: X ~ fo, 
Hy: X ~ fi, draw a graph of the ratio A(x) = fi (x)/fo(x) and find the form of 
the Neyman—Pearson test: 


(a) fo(x) = 5 exp(—|x + 1); fae) = 5 exp(—|x — 1). 

(b) fox) = 5 exp(—IxI); fie) = 1/[r( + x?)). 

(c) fox) = A/m)1+0+x)7171; A@) = 0/mil+d—-x)?7!. 
14, Let X1, X2,... , Xn be arandom sample with common PDF 


Find a size a MP test for testing Hp : 9 = 00 versus H, : 0 = 0 (> 6). 
15. Let X ~ f;, j = 0, 1, where 


tad 


— 
> 


So(x) 
fix) 


aie we | 
ah vl | Oo 
Ae wre | nN 


Aim te 
St te 


(a) Find the form of the MP test of its size. 
(b) Find the size and the power of your test for various values of the cutoff point. 


(c) Consider now a random sample of size n from fo under Ho or f; under Hy. 
Find the form of the MP test of its size. 


9.4 FAMILIES WITH MONOTONE LIKELIHOOD RATIO 


In this section we consider the problem of testing one-sided hypotheses on a single 
real-valued parameter. Let { f,@ € ©} be a family of PDFs (PMFs), © C R, and 
suppose that we wish to test Hp: 8 < 69 against the alternatives Hj: @ > 4 or 
its dual, Hj}: 9 > 4, against Hj: @ < 4p. In general, it is not possible to find a 
UMP test for this problem. The MP test of Ho: 0 < 9p, say, against the alternative 
6 = 4 (> 60) depends on 6; and cannot be UMP. Here we consider a special class 
of distributions that is large enough to include the one-parameter exponential family, 
for which a UMP test of a one-sided hypothesis exists. 


Definition 1. Let {/6, 0 € @} be a family of PDFs (PMFs), @  R. We say that 
{ fo} has a monotone likelihood ratio (MLR) in statistic T (x) if for 6; < 62, whenever 
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fo,, fo, are distinct, the ratio fo, (x)/fe, (x) is a nondecreasing function of T(x) for 
the set of values x for which at least one of fe, and fo, is > 0. 


It is also possible to define families of densities with nonincreasing MLR in T(x), 
but such families can be treated by symmetry. 


Example 1. Let X;, X2,..., Xn ~ U[0, 01,9 > 0. The joint PDF of X),..., Xn 
is 
1 


fo(x) = 4 0"” 
0, otherwise. 


0 < maxx; < 8, 


Let 62 > 6; and consider the ratio 
fon) (1/02) Imax x; <6) 
So, ®) (1/07) Ipmax x; <6;) 


(3 y Imax x; <6] 
7) I{max x; <6)] 


Let 


1, max x; € [0, 44], 


R(x) = Mmax x; <60)/ Tmax x; <6) = ts max x; € [01, 62] 
oy i 5 . 


Define R(x) = 00 if maxx; > 62. It follows that fo, /fo, is a nondecreasing func- 
tion of max;<j<n xj, and the family of uniform densities on [0, 9] has an MLR in 
max, <i <n Xi . 

Theorem 1. The one-parameter exponential family 
(1) Sa(x) = exp[Q(@)T (x) + S(x) + D()), 
where Q(@) is nondecreasing, has an MLR in T(x). 


The proof is left as an exercise. 


Remark I. The nondecreasingness of Q(@) can be obtained by a reparametriza- 
tion, putting 3 = Q(6), if necessary. 


Theorem 1 includes normal, binomial, Poisson, gamma (one parameter fixed), 
beta (one parameter fixed), and so on. In Example 1 we have already seen that 
U[O, 0], which is not an exponential family, has an MLR. 
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Example 2. Let X ~ C(1, 6). Then 


f(x) 14+@-aY 
fo(x) 1+ (@—&)? 


and we see that C(1, @) does not have an MLR. 


1 as x —> +00, 


Theorem 2. Let X ~ fo, 9 € ©, where {fg} has an MLR in T(x). For testing 
Ho: 6 < 6 against H;: 6 > 0, 69 € ©, any test of the form 


1 if T(x) > to, 
(2) gx)= Fy if TK) =, 
1 if T(x) < to, 


has a nondecreasing power function and is UMP of its size Eg,g(X) = a (provided 
that the size is not 0). 

Moreover, for every 0 < a < I and every 69 € 0, there exists a tg, —0o < fg < 
oo, and 0 < y < 1 such that the test described in (2) is the UMP size a test of Ho 
against Af. 


Proof. Let 61,82 € ©, @, < 62. By the fundamental lemma, any test of the form 


ic A(x) > k, 
(3) o(x) = 4 v(x), A(x) =k, 
0, A(x) < k, 


where A(x) = fo,(x)/fo, (x), is MP of its size for testing 9 = 6, against @ = 62, 
provided that 0 < k < oo; and if k = on, the test 
1 if fo, (x) = 0, 
(4) g(x) = : fo 
0 if fo,(x) > 0, 


is MP of size 0. Since fg has an MLR in T, it follows that any test of form (2) is also 
of form (3), provided that Eg,y(X) > 0, that is, provided that its size is > 0. The 
trivial test y’(x) = o has size a and power a, so that the power of any test (2) is at 
least a, that is, 


Eo,y(X) > Ep,g'(%) = a = Eo, p(X). 


It follows that if 0) < 62 and Eg,y(X) > 0, then Eo, e(X) < Eo,g(X), as asserted. 

Let 6; = 69 and 62 > 6, as above. We know that (2) is an MP test of its size 
Eo, 9(X) for testing 6 = % against @ = 62 (02 > 6 ), provided that Eg,p(X) > 0. 
Since the power function of g is nondecreasing, 


(5) Eog(X) < Eqe(X) = a0 for all 9 < 4. 
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Since, however, g does not depend on 62 (it depends only on constants k and y), it 
follows that y is the UMP size a test for testing 9 = 0 against 6 > 9. Thus ¢ is 
UMP among the class of tests g” for which 


(6) Eye" (X) < Eap(X) = a0. 


Now the class of tests satisfying (5) is contained in the class of tests satisfying (6) 
[there are more restrictions in (5)]. It follows that g, which is UMP in the larger class 
satisfying (6), must also be UMP in the smaller class satisfying (5). Thus, provided 
that ag > 0, y is the UMP size ag test for 9 < 0 against 0 > 4%. 

We ask the reader to complete the proof of the final part of the theorem, using the 
fundamental lemma. 


Remark 2. By interchanging inequalities throughout in Theorem 2, we see that 
this theorem also provides a solution of the dual problem Hj: 6 > 69 against 
Hi: 0 < 4. 


Example 3. Let X have the hypergeometric PMF 


(ew) 
Piygix Sao See. SO, 


oo 


Pm4i{X =x} M+1N—-M—n+x 
Py{X =x} N-M M+1-—x 


Since 
3 


we see that {Paz} has an MLR in x(Py,/Pmy,, where M2 > My, is just a product 
of such ratios). It follows that there exists a UMP test of Hp: M < Mo against 
Hy: M > Mo, which rejects Ho when X is too large; that is, the UMP size a test is 
given by 


1, x>k, 
g(x) = FY, x=k, 
0, x <k, 


where (integer) k and y are determined from 
Emo¢(X) = a. 


For the one-parameter exponential family, UMP tests also exist for some two- 
sided hypotheses of the form 


(7) Ao: 9 <6 or 8 > 62(6; < 62). 


We state the following result without proof. 
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Theorem 3. For the one-parameter exponential family (1), there exists a UMP 
test of the hypothesis Ho: 9 < 0; or @ > 02 (0; < 42) against Hy: 6; <0 < 62 that 
is of the form 


1 if cy < T(x) < co, 
(8) eX) = 17 if T(x) =c, i=1,2 (cy <2), 
0 if T(x) < c, or > c2, 


where the c’s and the y’s are given by 
(9) Eo, 9(X) = Ee, p(X) = a. 
See Lehmann [63, pp. 101-103] for proof. 


Example 4. Let X1, X2,..., Xn be iid N(yz, 1) RVs. To test Ho: w < po or 
> 1 (41 > Ho) against Hy: wo < uw < 41, the UMP test is given by 


1 if cy < x <2, 
oO) = VV if ox; = cy orce2, 
0 if )° x; <cyor > c2, 


where we determine c;, c2 from 
a = Pyy{er < > Xi <c2}= Puj{er < >> x < c2} 


and y; = 72 = 0. Thus 


= P| La nMo < Ante oe 
Jn Jn Jn 
_ pl cane _ UXia nm < 2 
fe Get: alge) ae 
ci) — nO c2 — nyo 
=p {et <z < 
Jn a 
cy ~ ny c2 — ny 
= pl ee 
| ey 


where Z is N’(0, 1). Given a, n, uo, and 441, we can solve for cy and cz from the 
simultaneous equations 


o( 2p) 0 (G4) == 
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0 5p) -«(2—pt)* 


where ® is the DF of Z. 


Remark 3. We caution the reader that UMP tests for testing Hp: 0; < 0 < 02 
and Hj: 9 = 9% for the one-parameter exponential family do not exist. An example 
will suffice. 


Example 5. Let X1, X2,... , Xn be asample from NV (0, 0). Since the family of 
joint PDFs of X = (X1,..., Xn) has an MLR in T(X) = PB x?, it follows that 
UMP tests exist for one-sided hypotheses o > o9 ando < op. 

Consider now the null hypotheses Hp: o = op against the alternative H): 0 # 
09. We will show that a UMP test of Ho does not exist. For testing o = oo against 
o > oo, a test of the form 


2 
1, Sixp >a, 
0, otherwise, 


gi) = 
is UMP, and for testing 0 = a9 against o < og, a test of the form 


ae < €2, 
0, otherwise, 


1, 
92(x) = 


0 1 2 3 


Fig. 1. Power functions of chi-square tests of Hy: o = op against H;. 
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is UMP. If the size is chosen as @, then cy = oe ay and cz = ag Mei a: Clearly, 
neither g; nor g2 is UMP for Mp against H;: o #4 a0. The power of any test of Ho 
for values 0 > oo cannot exceed that of g;, and for values of 0 < op it cannot 
exceed the power of test g2. Hence no test of Hp can be UMP (see Fig. 1). 


PROBLEMS 9.4 


1. For the following families of PMFs (PDFs) fe(x), 0 € © C FR, find a UMP size 
a test of Ho: 6 < 09 against H;: 9 > @, based on a sample of n observations: 


(a) fo(x) = 07 (1 — 0)'-*,x =0,1;0<90 <1. 

(b) fo(x) = (1/V2m) exp[—(x — 6)?/2], —00 < x < 00, —00 <6 < ©. 
(c) fo(x) = e798 (6* /x!), x =0,1,2,...;0 > 0. 

(d) fo(x) = (1/0)e7*/9, x > 0,0 > 0. 

(e) fo(x) = [1/P@)]}xo~'e*, x > 0,0 > 0. 

(f) fo(x) =0x9!,0<x<1,6>0. 


2. Let X;, X2,..., X, be a sample of size n from the PMF 


1 
Py(x) = 5 x=1,2,...,N;N e€{1,2,...}. 


(a) Show that the test 


( ) 1 if max(x1,x2,...,Xn) > No, 

X1,%2,--- An) = : 

Ostia At " a if max(x},%2,..-,%n) < No, 
is UMP size a for testing Hp: N < No against Hy: N > No. 

(b) Show that 


1 if max(x1, X2,...,Xn) > No or 
9(%1,%2,-.. Xn) = max(x},X2,---,%n) <a!/"No, 
0 otherwise, 


is a UMP size a test of Hj: N = No against H;: N # No. 
3, Let X1, X2,..., Xn be a sample of size n from U(0, 6), 8 > 0. Show that the 
test 


( ) 1 if max(x1,...,%n) > 9, 
1 x »X2,--- 2X, ca . 
ee - a if max(x1, X%2,...,Xn) < 9, 


is UMP size @ for testing Ho: 0 < 8 against H,: @ > 6 and that the test 
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1 if max(x},...,%,) > 6 or 
92(41, X25... Xn) = max(x1,x2,-.. Xn) < Oal/”, 
0 otherwise, 


is UMP size a for Hj: 0 = 0 against H{: 0 # 4. 
4. Does the Laplace family of PDFs 
fo(x) = Sexp(-lx-0)), -oo<x<0o, OER, 


possess an MLR? 


5. Let X have logistic distribution with the PDF 
fo(x) =e PL te 9)? ER. 


Does { fg} belong to the exponential family? Does { fo} have MLR? 


6. (a) Let fo be the PDF of a. N'(0, 6) RV. Does { f6} have MLR? 
(b) Do the same as in part (a) if X ~ N(6, 6). 


9.5 UNBIASED AND INVARIANT TESTS 
We have seen that if we restrict ourselves to the class ®, of all size a tests, there 
do not exist UMP tests for many important hypotheses. This suggests that we reduce 


the class of tests under consideration by imposing certain restrictions. 


Definition 1. A size a test g of Ho: 6 € Oo against the alternatives H).: 6 € ©; 
is said to be unbiased if 


() Eeg(X) >a for all 9 € ©. 


It follows that a test g is unbiased if and only if its power function By (@) satisfies 


(2) Bo(0)<a for@e@p 
and 
(3) By(0) >a for 6 € @. 


This seems to be a reasonable requirement to place on a test. An unbiased test rejects 
a false Ho more often than a true Hp. 


Definition 2. Let Uz be the class of all unbiased size a tests of Ho. If there exists 


atest g € Ug that has maximum power at each 6 € ©), we call g a UMP unbiased 
size @ test. 
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Clearly, Uy C ®q. If a UMP test exists in Dg, it is UMP in U,. This follows 
by comparing the power of the UMP test with that of the trivial test g(x) = a. It is 
convenient to introduce another class of tests. 


Definition 3. A test y is said to be a-similar on a subset ©* of © if 
(4) By(0) = Eop(X) =a = ford € O*. 


A test is said to be similar on a set ©* C O if it is w-similar on ©* for some a, 
O<a <i. 


It is clear that there exists at least one similar test on every ©*, namely, p(x) = a, 
O<e <i. 


Theorem 1. Let By (6) be continuous in 6 for any g. If g is an unbiased size a test 
of Ho: 9 € Op against Hy: 0 € Oy, it is a-similar on the boundary A = ©9N ©}. 
(Here A is the closure of set A.) 


Proof. Let@ € A. Then there exists a sequence {6,}, 0, € Qo, such that 6, — @. 
Since By(@) is continuous, By(O,) > By(@); and since By(O,) < a for 6, € Oo, 
By(@) < a. Similarly, there exists a sequence {@,}, 0, € @1, such that B,(6,) > a 
(g is unbiased) and 6) — 6. Thus B,(6)) — B,(@), and it follows that By(0) > a. 
Hence 6,(6) = @ foré € A, and ¢ is a-similar on A. 


Remark 1. Thus if 8,(@) is continuous in @ for any g, an unbiased size a test of 
Ho against H is also a-similar for the PDFs (PMFs) of A, that is, for { fo, @ € A}. If 
we can find an MP similar test of Hp: @ € A against H, and if this test is unbiased 
size a, then necessarily it is MP in the smaller class. 


Definition 4. A test g that is UMP among all a-similar tests on the boundary 
A = 09 OQ, is said to be a UMP a-similar test. 


It is frequently easier to find a UMP a-similar test. Moreover, tests that are UMP 
similar on the boundary are often UMP unbiased. 


Theorem 2. Let the power function of every test p of Ho: 6 € Og against 
Hy: 0 € ©, be continuous in 6. Then a UMP a-similar test is UMP unbiased, 
provided that its size is a for testing Ho against Hy. 


Proof. Let go be UMP a-similar. Then Eggo(X) < a for @ € Oo. Comparing 
its power with that of the trivial similar test p(x) = a, we see that gp is unbiased 
also. By the continuity of 8,(@), we see that the class of all unbiased size a tests is a 
subclass of the class of all a-similar tests. It follows that gp is a UMP unbiased size 
@ test. 
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Remark 2. The continuity of power function B,(@) is not always easy to check, 
but sufficient conditions may be found in most advanced calculus texts (see, for ex- 
ample, Widder [116, p. 356]). If the family of the PDF (PMF) f¢ is an exponential 
family, a proof is given in Lehmann [63, p. 59]. 


Example I. Let X, X2,...,Xn be a sample from N(y, 1). We wish to test 
Ho: u < O against H,: x > 0. Since the family of densities has an MLR in I Xi, 
we can use Theorem 9.4.2 to conclude that a UMP test rejects Ho if YI Xi > c. 
This test is also UMP unbiased. Nevertheless, we use this example to illustrate the 
concepts introduced above. 

Here Op = {u < 0}, ©; = {u > O}, and A = OpN ©; = {u = O}. Since 
T(X) = 1a X; is sufficient, we focus attention on tests based on T alone. Note 
that T ~ N (ny, n), which is one-parameter exponential. Thus the power function 
of any test g based on T is continuous in yz. It follows that any unbiased size a test 
of Ho has the property 8,(0) = a of similarity over A. In order to use Theorem 2, 
we find a UMP test of Hy : u € A against Hy. Let uw; > 0. By the fundamental 
lemma, an MP test of 2 = 0 against ~ = jp; > 0 is given by 


2 of 2 
if exp E — coe) > k’, 


ptt) = 2n 
0 otherwise, 
_fi ife>k 
“10 ift <k, 


where k is determined from 
a=P{T>k)=P {z f 
— = >. 
0 a 


Thus k = ./n z,. Since g is independent of 2; as long as 41 > 0, we see that the 
test 


0, otherwise, 


g(t) = fF Maen 


is UMP a-similar. We need only check that ¢ is of the right size for testing Ho against 
A. We have for yz < 0, 


E,g(T) = Pi {T > Vn Za} 
= T-—nyp _ 
=P Fe > viu| 


< P{Z > za}, 
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since —./n wu > 0. Here Z is N’(0, 1). It follows that 

Ey@(T) <a for u < 0, 
hence g is UMP unbiased. 


Theorem 2 can be used only if it is possible to find a UMP a-similar test. Unfor- 
tunately, this requires heavy use of conditional expectation, and we will not pursue 
the subject any further. We refer to Lehmann [63, Chaps. 4 and 5], and Ferguson [25, 
pp. 224-233], for further details. 

Yet another reduction is obtained if we apply the principle of invariance to 
hypothesis-testing problems. We recall that a class of distributions is invariant under 
a group of transformations G if for every g € G and every @ € © there exists a 
unique @’ € © such that 9(X) has distribution Pg’, whenever X ~ Pg. We rewrite 
0’ = 20. 

In a hypothesis-testing problem we need to reformulate the principle of invari- 
ance. First, we need to ensure that under transformations G, not only does P = 
{Pe: 8 € ©} remain invariant but also the problem of testing Ho: @ € Oo against 
H,: @ € ©, remains invariant. Second, since the problem has not changed by appli- 
cation of G, the decision also must not change. 


Definition 5. A group G of transformations on the space of values of X leaves a 
hypothesis-testing problem invariant if G leaves both {P9: @ € Oo} and {P9: 0 € 
©} invariant. 

Definition 6. We say that y is invariant under G if 

y(g(x)) = v(x) for all x and all g € G. 

Definition 7. Let G be a group of transformations on the space of values of the 
RV X. We say that a statistic T(x) is maximal invariant under G if (a) T is invariant; 
(b) T is maximal, that is, T(x,) = T (x2) = x, = g(x2) for some g € G. 

Example 2. Let x = (x1, x2,... , Xn), and G be the group of translations 

Bc(X) = C4 +0,...,%n +0), —-~O <c <0. 
Here the space of values of X is R,. Consider the statistic 
T(x) = (Xn — X1,--- , Xn — Xn—1)- 


Clearly, 


T (&c(X)) = On — X1,--- 5 Xn — Xn-1) = T(X). 
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If T(x) = T(x’), then x, — xj = x, — xj,i = 1,2,...,2 — 1, and we have 
Xj—X] =X_,—x}, = c(i = 1,2,... ,m—1); that is, g,(x’) = (xj +¢,... x, +¢) =x 
and T is maximal invariant. 

Next consider the group of scale changes 


&c(K) = (CX1,..- ,CXn), c>0. 
Then 
0 if all x; = 0, is 
T(x) = (2... : **) if at leastone x; #0, z= (>) : 
z z 


is maximal invariant; for 

T(g¢(x)) = T(cxj,... ,cxXn) = T(x), 
and if T(x) = T(x’), then either T(x) = T(x’) = 0, in which case x; = x; = 0, or 
T(x) = T(x’) ¥ 0, in which case x;/z = xj/z’, implying that x; = (z’/z)x; = cxi, 


and 7 is maximal. 
Finally, if we consider the group of translation and scale changes, 


g(x) = (ax, + B,... , ax, +b), a>Q, -co<b<ow, 
a maximal invariant is 
0 if B = 0, 
T(x) = (25+ x2 -X Xn —-X 
p ~ p"  B 


where x = n~! >. x; and B = no! Yi — x). 


Definition 8. Let /,, denote the class of all invariant size a tests of Hp: @ € Qo 
against H;: @ € ©,. If there exists a UMP member in /,, we call the test a UMP 
invariant test of Ho against Hy. 


The search for UMP invariant tests is greatly facilitated by use of the following 
result. 


Theorem 3. Let T (x) be maximal invariant with respect to G. Then is invariant 
under G if and only if g is a function of T. 


Proof. Let ¢ be invariant. We have to show that T(x1) = T(x2) > o(x}) = 
(x2). If T(x) = T(x), there is a g € G such that x) = g(x), so that g(x) = 
9(8(X2)) = p(x2). 
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Conversely, if g is a function of T, p(x) = A[T (x)], then 
p(g(x)) = A[T(g(x))] = A[T(x)] = g(x), 
and g is invariant. 


Remark 3. The use of Theorem 3 is obvious. If a hypothesis-testing problem is 
invariant under a group G, the principle of invariance restricts attention to invariant 
tests. According to Theorem 3, it suffices to restrict attention to test functions that 
are functions of maximal invariant T. 


Example 3. Let X;,X2,...,Xn be a sample from N(u, 07), where both u 
and o” are unknown. We wish to test Ho: 0 > 90, —00 < m < 00, against 
Hi: 0 < 09, —0© < pt < oo. The family {A(2, 07)} remains invariant under 
translations x; = xj +c, —0o < c < oo. Moreover, since var(X +c) = var(X), the 
hypothesis-testing problem remains invariant under the E eroup of translations; that is, 
both {N(u, 02): 0? > og} and {N (ut, 02): 0? < ag} remain invariant. The joint 
sufficient statistic is (X, YG - X)), which is transformed to (X +c, ke fe X)?) 
under translations. A maximal invariant is )°(X; ~ X)?. It follows that the class of 
invariant tests consists of tests that are functions of )>(Xi — — X)?. 

Now )-(X; — X)?/o2 ~ x?(n—1), so that the PDF of Z = )°(X; — X)? is given 
by 


go 7 @-)) 


(n—-3)/2_,—z/20? 
Tal pApene” ez 


Fg2(2Z) = 


The family of densities { f,2: o” > 0} has an MLR in z, and it follows that a UMP 
test is to reject Ho: 0? > of if z < k, that is, a UMP invariant test is given by 


1 if Y@—? <k, 
p(x) = : 2 a 
0 if )°(%; —X)* > k, 
where k is determined from the size restriction 
a (Xp 2X) ok 
v= Pa [Dos HF st] = ROLE «Hl, 


that is, 
22 
k= 9% Xn-1,1-a* 


Example 4. Let X have PDF f;(x; —0@,... , xn — 9) under H; (i = 0, 1), —0o < 
0 < oo. Let G be the group of translations 


f(x) = Gy 4+c,...,%n +0), —0o0o<ce<00, n>2. 
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Clearly, g induces g on ©, where 96 = @ +c. The hypothesis-testing prob- 
lem remains invariant under G. A maximal invariant under G is T(X) = (X1 — 
Xni.e»sXn—-1 — Xn) = (TN, To, ... , Ty-1). The class of invariant tests coincides 
with the class of tests that are functions of 7. The PDF of T under H; is independent 
of 6 and is given by yee Fitz, ..- ,t—1+2, z) dz. The problem is thus reduced to 
testing a simple hypothesis against a simple alternative. By the fundamental lemma 
the MP test 


1 if A() > c, 
ti, t2,..-,t-1) = : 
9A, f2 m1) ( if A() <c, 
where t = (t, f2,... , f—1) and 


CO 
i Siti +2,..-,m-1+2,2 dz 
—00 


(a2 
/ folti +2, ---5m-1+2%,2) dz 
—oo 


is UMP invariant. 
A particular case of Example 4 will be, for instance, to test Hyp: X ~ N(6, 1) 
against H): X ~ C(1, 0), @ € R (see Problem 1). 


Example 5. Suppose that (X, Y) has joint PDF 
fo(x, y) = Ap exp(—Ax — py), x>0, y>0O, 


and = 0 elsewhere, where @ = (A, 2), A > 0, u > 0. Consider scale group G = 
{{0, c}, c > 0} which leaves { fg} invariant. Suppose that we wish to test Hp: uw >A 
against Hi: uw < X. It is easy to see that G@o = Qo, so that G leaves (a, @o, @1) 
invariant and T = Y/X is maximal invariant. The PDF of T is given by 


Ap 


On t>0, =Ofort <0. 


fg (t)= 


The family { fe) has MLR in 7, and hence a UMP invariant test of Ho is of the form 


1, t > c(a@), 
gt)=jy, t=c(a), 
0, t <c(a@), 


where 


99 1 l-a 
a= ———. dt > c(a) = ——. 
is (1 +1)? a 
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PROBLEMS 9.5 


1. To test Hy: X ~ N(@, 1), against H;: X ~ C(i,@),a sample of size 2 is 
available on X. Find a UMP invariant test of Ho against Hy. 

2. Let X1, X2,... , X, be a sample from P(A). Find a UMP unbiased size a test 
for the null hypothesis Ho: A < Ao against alternatives A > Ag by the methods 
of this section. 


3. Let X ~ NBC; 6). By the methods of this section, find a UMP unbiased size a 
test of Hp: 0 > 09 against H,: 0 < 6. 
4, Let X;, X2,... , Xn iid N'(y, 07) RVs. Consider the problem of testing Ho: u < 
0 against Hy: p > 0. 
(a) It suffices to restrict attention to sufficient statistic (U, V), where U = X 
and V = S*. Show that the problem of testing Ho is invariant under G = 
{{a, 1}, a € R} and a maximal invariant is T = U/V/V. 
(b) Show that the distribution of T has MLR, and a UMP invariant test rejects 
Ho when T > c. 


5, Let X1, X2,... , X, be iid RVs and let Hp be that X; ~ A’(@, 1) and Ay be 
that the common PDF is fo(x) = 4 exp(—|x — 6|). Find the form of the UMP 
invariant test of Ho against Hj. 

6. Let X1, X2,..., Xp be iid RVs and suppose that Hp: X; ~ NM(O,1) and 
Ay: Xi ~ fix) = exp(—|x])/2. 

(a) Show that the problem of testing Ho against Hy is invariant under scale 
changes g-(x) = cx, c > 0 and a maximal invariant is T(X) = (X1/Xn,..., 
Xn-i/Xn). 

(b) Show that the MP invariant test reject Hp when 


vit D y? 
a 4 


1+ yo Hil 
where Y; = X;/Xn, j =1,2,...,n — 1, or equivalently, when 
1/2 
n 2 
71 1X;l 


9.6 LOCALLY MOST POWERFUL TESTS 


In the preceding section we argued that whenever a UMP test does not exist, we 
restrict the class of tests under consideration and then find a UMP test in the subclass. 
Yet another approach when no UMP test exists is to restrict the parameter set to 
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a subset of ©,. In most problems, the parameter values that are close to the null 
hypothesis are the hardest to detect. Tests that have good power properties for “local 
alternatives” may also retain good power properties for “nonlocal” alternatives. 


Definition 1. Let © C R. Then a test go with power function By, (8) = Eogo(X) 
is said to be a locally most powerful (LMP) test of Ho: 6 < 9 against Hy: 6 > 6 
if there exists a A > 0 such that for any other test g with 


(1) Bo (60) = By (Po) = / G(X) fay (x) dx, 
(2) Boy (0) = By(O) for every 0 € (09, 09 + Al. 


We assume that the tests under consideration have continuously differentiable 
power function at 9 = 6p and the derivative may be taken under the integral sign. In 
that case, an LMP test maximizes 


0 oa _ < 
(3) <Po)|,_, = Ae), , = f osx fo], ax 


subject to the size constraint (1). A slight extension of the Neyman—Pearson lemma 
(Remark 9.3.2) implies that a test satisfying (1) and given by 


. oO 
: if 99 2) . > kfog(x), 
to] 
(4) go(x) = 4 VY ifs, fo) = k fay (x), 
6 
a 
0 if, fo) f < k fog(x) 


will maximize Bo (60). It is possible that a test that maximizes By (99) is not LMP, but 
if the test maximizes 6’(0) and is unique, it must be an LMP test (see Kallenberg et 
al. [47, p. 290] and Lehmann [63, p. 528}). 

Note that for x for which fg,(x) 4 0, we can write 


a 
x5 fo (x) 
00  _ 9 
Fae 99 OB LMey» 


and we can rewrite 
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rs] 
1 if — log fo(x)}| >k, 
36 i. 
. @ 
(5) gox)= fy if 50 log fo(x)| =k, 
% 
_. @ 
0 if — log fo(x)|  <k. 
a0 6 
Example 1. Let X1, X2,... , Xn be iid with common normal PDF with mean p 


and variance o7. If one of these parameters is unknown while the other is known, 
the family of PDFs has MLR, and UMP tests exist for one-sided hypotheses for the 
unknown parameter. Let us derive the LMP test in each case. 

First consider the case when o? is known, say o* = land Ho: uw <0, Hi: u> 0. 
An easy computation shows that an LMP test is of the form 


gyll ee et 
Nye | Gee p. 


which, of course, is the form of the UMP test obtained in Problem 9.4.1 by an appli- 
cation of Theorem 9.4.2. 

Next consider the case when ys is known, say 2 = O and Hp: 0 < 50, Mi: 0 > 
oo. Using (5), we see that an LMP test is of the form 


1 if V_ x2 >k, 
pix) = j Dit 2 
0 if ein 47 Sk, 
which coincides with the UMP test. 
In each case the power function is differentiable and the derivatives may be taken 
inside the integral sign because the PDF is a one-parameter exponential type PDF. 


Example 2. Let X;, X2,... , Xn be iid RVs with common PDF 


1 1 


--— R, 
TlEaeoee: 


fo(x) = 


and consider the problem of testing Ho: 0 < 0 against H,: 6 > 0. 

In this case { fg} does not have MLR. A direct computation using the Neyman— 
Pearson lemma shows that an MP test of 6 = 0 against 6 = 0), 6, > 0, depends on 
@; and hence cannot be MP for testing 9 = 0 against 6 = 62, 02 # 0;. Hence a UMP 
test of Ho against H; does not exist. An LMP test of Ho against Hj is of the form 


n 


> =e A 


1 i 
po(x) = jen DE; 
0 otherwise, 
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where k is chosen so that the size of gp is a. For small n it is hard to compute k but for 
large n it is easy to compute k using the central limit theorem. Indeed, X;/(1 + x?) 
are iid RVs with mean 0 and finite variance (= 2), so that k = Zq4/n/2 will give an 
(approximate) level a test for large n. 

The test yp is good at detecting small departures from 9 < 0, but it is quite 
unsatisfactory in detecting values of @ away from 0. In fact, for a < 5, Bo (9) > 0 
as @ —> 00. 

This procedure for finding locally best tests has applications in nonparametric 
statistics. We refer the reader to Randles and Wolfe [83, Sec. 9.1] for details. 


PROBLEMS 9.6 


1. Let X1, X2,... , Xn be iid C(1, 0) RVs. Show that Eo(1+X2)~* = (1/r) B(k+ 
5, 4). Hence or otherwise, show that 


: ne ae X, \ 1 
*la+xy | ie xe] 8 


2. Let X1, X2,... , X, be arandom sample from the logistic PDF 


1 e —¢ 


fo) = Fy coshe —6)) ~ Ge Be” 


Show that the LMP test of Hp: 9 = 0 against Hj: @ > O rejects Hp if 
7, tanh(x;/2) > k. 
3. Let X1, X2,..., X, be iid RVs with the common Laplace PDF 
fo(x) = 4 exp(—|x - 4). 


For n > 2, show that a UMP size a (0 < a < 1) test of Ho: 6 < 0 against 
AH: 6 > O does not exist. Find the form of the LMP test. 


CHAPTER 10 


Some Further Results of 
Hypothesis Testing 


10.1 INTRODUCTION 


In this chapter we study some commonly used procedures in the theory of testing 
of hypotheses. In Section 10.2 we describe the classical procedure for constructing 
tests based on likelihood ratios. This method is sufficiently general to apply to multi- 
parameter problems and is especially useful in the presence of nuisance parameters. 
These are unknown parameters in the model which are of no inferential interest. Most 
of the normal theory tests described in Sections 10.3 to 10.5 and those in Chapter 12 
can be derived by using methods of Section 10.2. In Sections 10.3 to 10.5 we list 
some commonly used normal theory-based tests. In Section 10.3 we also deal with 
goodness-of-fit tests. In Section 10.6 we look at the hypothesis testing problem from 
a decision-theoretic viewpoint and describe Bayes and minimax tests. 


10.2. GENERALIZED LIKELIHOOD RATIO TESTS 


In Chapter 9 we saw that UMP tests do not exist for some problems of hypothesis 
testing. In was suggested that we restrict attention to smaller classes of tests and seek 
UMP tests in these subclasses or, alternatively, seek tests that are optimal against 
local alternatives. Unfortunately, some of the reductions suggested in Chapter 9, such 
as invariance, do not apply to all families of distributions. 

In this section we consider a classical procedure for constructing tests that has 
some intuitive appeal and that frequently, though not necessarily, leads to optimal 
tests. Also, the procedure leads to tests that have some desirable large-sample prop- 
erties. 

Recall that for testing Hp: X ~ fo against H; : X ~ fi, the Neyman—Pearson MP 
test is based on the ratio f1(x)/fo(x). If we interpret the numerator as the best possi- 
ble explanation of x under H,, and the denominator as the best possible explanation 
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of X under Hp, it is reasonable to consider the ratio 


r(x) = vPeces L(O;x)  supgce, fo(x) 
SUPgce, L(A: X) = SUPgce, Sox) 


as a test statistic for testing Ho: @ € Oo against Hy): @ € ©,. Here L(@; x) is the 
likelihood function of X. Note that for each x for which the MLEs of @ under ©; and 
po, exist, the ratio is well defined and free of @ and can be used as a test statistic. 
Clearly, we should reject Ho if r(x) > c. 

The statistic r is hard to compute; only one of the two suprema in the ratio may be 
attained. Let @ € © C Ry be a vector of parameters, and let X be a random vector 
with PDF (PMF) fe. Consider the problem of testing the null hypothesis Hp: X ~ 
fo, @ & Oo against the alternative H): X ~ fe, 8 € ©. 


Definition 1. For testing Ho against Hj, a test of the form: reject Ho if and only 
if A(x) < c, where c is a constant, and 


A(x) = Poco Fo, x2, ++: Xn) 
SUPgce Fo(x1, X2,-.- Xn) 


is called a generalized likelihood ratio (GLR) test. 


We leave the reader to show that the statistics A(X) and r(X) lead to the same 
criterion for rejecting Ho. 

The numerator of the likelihood ratio 4 is the best explanation of X (in the sense of 
maximum likelihood) that the null hypothesis Hp can provide, and the denominator is 
the best possible explanation of X. Ho is rejected if there is a much better explanation 
of X than the best one provided by Ap. 

It is clear that 0 < A < 1. The constant c is determined from the size restriction 


sup Pe{A(X) <c} =a. 
BEOQo 


If the distribution of 4 is continuous (that is, the DF is absolutely continuous), any 
size a is attainable. If, however, A(X) is a discrete RV, it may not be possible to find 
a likelihood ratio test whose size exactly equals a. This problem arises because of 
the nonrandomized nature of the likelihood ratio test and can be handied by random- 
ization. The following result holds. 


Theorem 1. If for given a, 0 < a < 1, nonrandomized Neyman—Pearson and 
likelihood ratio tests of a simple hypothesis against a simple alternative exist, they 
are equivalent. 


The proof is left as an exercise. 
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Theorem 2. For testing @ € @o against @ € ©j, the likelihood ratio test is a 
function of every sufficient statistic for 6. 


Theorem 2 follows from the factorization theorem for sufficient statistics. 


Example I. Let X ~ b(n, p), and we seek a level @ likelihood ratio test of 
Ao: p < po against Ay: p > po: 


sup ("ora ~ py 


PSPo 


sup ("era — py 


O<p<! 


Mx) = 


Now 


sup p*(1— p)"* = (=) (1 as *\" 


0<p<! n 


The function p*(1 — p)"~* first increases, then achieves its maximum at p = x/n, 
and finally decreases, so that 


es : x 
£0 — p)""* Po (1 — po)” if po<-, 
sup p —?p = X\x X\n-x x 
PSPo (=) ( = ~) if st; < po- 
It follows that 
_ a~x 
po — po) ae x, 


A(x) = ¢ (&/n)* [1 — @&/n)y"-* 


Note that A(x) < 1 for npo < x and A(x) = 1 if x < npg, and it follows that A(x) 
is a decreasing function of x. Thus A(x) < c if and only if x > c’, and the GLR test 
rejects Hp ifx > c’. 

The GLR test is of the type obtained in Section 9.4 for families with an MLR 
except for the boundary A(x) = c. In other words, if the size of the test happens to 
be exactly a, the likelihood ratio test is a UMP level a test. Since X is a discrete RV, 
however, to obtain size a may not be possible. We have 


a = sup Pp{X > c!} = Pp {X > c}. 
PSPo 


If such a c’ does not exist, we choose an integer c’ such that 


Pp i{X >c}<a and Ppyf{[X>c'—1}>a. 
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The situation in Example 1 is not unique. For a one-parameter exponential family 
it can be shown (Birkes [6]) that a GLR test of Ho: 0 < against Hj: 0 > 6 is 
UMP of its size. The result holds also for the dual Hy : @ > Go and, in fact, for a 
much wider class of one-parameter family of distributions. 

The GLR test is specially useful when @ is a multiparameter and we wish to 
test hypothesis concerning one of the parameters. The remaining parameters act as 
nuisance parameters. 


Example 2. Consider the problem of testing u = to against 4 ~ jo in sam- 
pling from M(y, 07), where both 2 and o? are unknown. In this case @j = 
{(uo, 07): o? > 0} and @ = {(u",07): — 00 < pw < 00, a2 > O}. We write 
0 = (ut, 0”): 


= hs 2 LiGi — #0)* jo)? 
bi Fo(x) = oe (oJiny oo| 22 eel 


= fZ,), 
where 6@ is the MLE, 62 = (1/n) )77_, (4 — wo)”. Thus 


1 —a/2 


pe fo SS Se ee 
(2m /n)"/2 [Soha — po)? ]" 


The MLE of @ = (2, 07) when both y and oc are unknown is Oi xi/n, YG - 
X)?/n). If follows that 


1 DiGi — 4)” yu)? 
aco p fol = stp |e ol - 202 ue |t 


e n/2 
n/2 


~ Qx/ny2 [E(x — 37] 


yk | eee |” 
Gu — Ho)? 


1 2 
. 1+ [n@ — no)?/ Si — ¥)*) , 
The GLR test rejects Ho if 


A(x) <, 
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and since A(x) is a decreasing function of n(x — 1o)*/ S~ n(xi — ¥)*, we reject Ho 
if 


xX — Lo 
| OC 
Gai — ¥)? 


that is, if 


wea sagt 


where s2 = (n — 1)7! Y7 (xi — X)*. The statistic 


J/n(X — 0) 


t(X%) = 5 


has a t-distribution with nm — 1 d.f. Under Ho: 2 = uo, t(X) has a central t(n — 1) 
distribution, but under H,: 4 4 0, tCX) has a noncentral t-distribution with n — 1 
d.f. and noncentrality parameter 6 = (4 — 44o)/o. We choose c” = ty—1,0/2 in 
accordance with the distribution of #(X) under Hp. Note that the two-sided t-test 
obtained here is UMP unbiased. Similarly, one can obtain one-sided t-tests also as 
likelihood ratio tests. 


The computations in Example 2 could be slightly simplified by using Theorem 2. 
Indeed, T(X) = (X, S?) is a minimal sufficient statistic for 9, and since X and S? 
are independent, the likelihood is the product of the PDFs of X and S?. We note that 


X ~ N(u, o?/n) and S? ~ [o?/(n — 1)]x2_,. We leave it to the reader to carry out 
the details. 


Example 3. Let X;, X2,...,Xm and Y\, Yo,...,Y, be independent random 
samples from N (11,07) and N(u2, 0), respectively. We wish to test the null 
hypothesis Ho: oa? = oF against A: oa? # of. Here 

O= {(11, 07, 112, 03): ~— 00 < pi < 00, 0? > 0,i = 1,2} 
and 


= {(41, 07, 2, 09): — OO <pj<w,i= 1,2,07 =o > 0}. 


Let @ = (44, a, 12, os). Then the joint PDF is 


Gaymrnoroy ote. 2 i< — 2 
One) 2gg o| an 2 Le Ki)” — Io? a #2) | 


fo(x, y) = 
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Also, 


m+n 
2 


m n » (xj — 1) 
l lx — —1 i tne lo 2 eo SE 
Og 21 Og Oj 1g Oz 2 


log fo(x, y) = — 


Differentiating with respect to 21 and j22, we obtain the MLEs 
Ay =X, fiz =y. 
Differentiating with respect to of and oF, we obtain the MLEs 
2_1¥ 2 2_I1¢ 2 
of = xj — s co = — i ‘: 
o} i 2X i —x) aart LM y) 
If, however, a? = of — o, the MLE of o? is 
g2- YT —- 3 + NiO: — 
m+n , 
Thus 
e7(m+n)/2 
sup fe(x, Y) = — 
0<6y [21 /(m + ny]ont/2 9" (a; — HYP? + OMG — PY? 
and 
en (m+n)/2 


sup fe(x, y) = ——_—_—__—______—__——____,,-__—_—_-_—_.,, 
0c (2x /m)"/2 (2x /n)"/? [OT Gai — x2]? Sidi - yyy? 


so that 
Mx, y) = ( ie (2)" [ete 97)" [eto - 7 
ae mtn} [eto - 2+ Digi — yp]? 
Now 


[sre aT" [So — wT? 
[eta 2+ L7On — 2]? 
1 
(1+ DP: -— 92/0; — PY"? [14 L2O% — 2 OG — HY" 
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— YT@i — ¥)?/@n - 1) 


ES RG = wey” 


we have 


m m/2 7 nf2 
A(x, y) = 
(x.y) (—".) (—-) 
1 


a a eae 
{1+ [0m — 1)/@ — 11 fF}? (1 + [@ — 1)/Gm — DICA/P)y"/? 


We leave the reader to check that A(x, y) < c is equivalent to f < c; or f > cz. 
(Take logarithms, zac use properties of convex functions. Alternatively, differentiate 
log X.) 

Under Hp, the statistic 


_ LPOG — XP /(m — 1) 
1%; — ¥)P?/(— 1) 


has an F(m — 1, — 1) distribution, so that c1, cz can be selected. It is usual to take 
a 
P{F sc}=P{F>cQ}= > 


Under H1, (07/07) F has an F(m — 1, n — 1) distribution. 


In Example 3 we can obtain the same GLR test by focusing attention on the joint 
sufficient statistic (X, Y, S%, $2), where S% and S? are sample variances of the X’s 
and the Y’s, respectively. In order to write down the likelihood function, we note 
that X, Y, hye S? are independent RVs. The distributions X and s2 are the same as 
in Example 2 except that m is the sample size. Distributions of Y and Se require 
appropriate modifications. We leave the reader to carry out the details. It turns out 
that the GLR test coincides with the UMP unbiased test in this case. 

In certain situations the GLR test does not perform well. We reproduce here an 
example due to Stein and Rubin. 


Example 4. Let X be a discrete RV with PMF 
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under the null hypothesis Hp: p = 0, and 


pe if x = —2, 
MeO) ©. ifx = 41, 
P,{X } j—a \2 
pik = xXP= t= 
a( <) if x =0, 
l-@ 
(1 — p)c if x = 2, 


under the alternative Hi: p € (0, 1), where a and c are constants with 


1 
see iar and so <<a. 


To test the simple null hypothesis against the composite alternative at the level of 
significance a, let us compute the likelihood ratio 1. We have 


since a/2 < c. Similarly, A(—2) = a/(2c). Also, 


1 
za l-a@ 1 
AQ) = A(—1) = ——_2—_____ = _—, = 
MS eateries 
and 
l-a@ 
a er 


The GLR test rejects Ho if A(x) < k, where k is to be determined so that the level 
is a. We see that 


Po {aco < = = Po{X = 42) =a, 


provided that a/2c < [(1 — @)/(1 — c)]. But a/(2 — a) < c < @ implies that 
a < 2c —ca, so thata —ca < 2c —2ca, ora(1 —c) < 2c(1 —@), as required. Thus 
the GLR size a test is to reject Hp if X = +2. The power of the GLR test is 


Pp {MD < =| = PpIX = 42) = pe + (I= pee <a 


for all p € (O, 1). The test is not unbiased and is even worse than the trivial test 
g(x) =a. 
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Another test that is better than the trivial test is to reject Hp whenever x = 0 (this 
is opposite to what the likelihood ratio test says). Then 


c ; 
>a (since c < a), 


P(X =0}=a, and Pp{X =0} = a-— 


for all p € (0, 1), and the test is unbiased. 


We will use the generalized likelihood ratio procedure quite frequently hereafter 
because of its simplicity and wide applicability. The exact distribution of the test 
statistic under Ho is generally difficult to obtain (despite what we saw in Examples 1 
to 3 above), and evaluation of power function is also not possible in many problems. 
Recall, however, that under certain conditions the asymptotic distribution of the MLE 
is normal. This result can be used to prove the following large-sample property of 
the GLR under Ap, which solves the problem of computation of the cutoff point c at 
least when the sample size is large. 


Theorem 3. Under some regularity conditions on f@(x), the random variable 
—2 log A(X) under Ho is asymptotically distributed as a chi-square RV with degrees 
of freedom equal to the difference between the number of independent parameters in 
© and the number in ©o. 


We will not prove this result here; the reader is referred to Wilks [117, p. 419]. 
The regularity conditions are essentially those associated with Theorem 8.7.4. In 
Example 2 the number of parameters unspecified under Ho is 1 (namely, 07), and 
under Hj two parameters are unspecified (4 and a”), so that the asymptotic chi- 
square distribution will have 1 d.f. Similarly, in Example 3, the d.f. = 4-3 = 1. 


Example 5. In Example 2 we showed that in sampling from a normal population 
with unknown mean yz and unknown variance o?, the likelihood ratio for testing 
Ho: & = uo against Hy: uw # Uo is 


A(x) = | 1+ 2G = Ho)? | 75 
= 7 (xj — ¥)* 
Thus 
ae (X — po)? 
—2 log A(X) = n log [ tO Sny, = | : 


Under Ho, /n(X — 0)/o ~ N(O, 1) and F(X; — X)?/o? ~ x2(n — 1). Also, 


Won (Xj -X)2/[(n — 102] > 1. It follows that if Z ~ N(O, 1), then ~2 log A(X) 
has the same limiting distribution as. log{1 + Z?/(n — 1)]. Moreover, 


2 a 
(: + na —s exp(Z?) 
n —_— 
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and since logarithm is a continuous function, we see that 
Zz L 
n log (: + i) ery A 
n — 


Thus —2 log a(X) aN Y, where Y ~ x2(1). This result is consistent with Theo- 
rem 3. 


PROBLEMS 10.2 


1. Prove Theorems 1 and 2. 


2. A random sample of size n is taken from the PMF P(X; = xj) = pj, j = 
1,2;3;4,.0 < py <1, See pj = 1. Find the form of the GLR test of 
Ho: pi = p2 = p3 = pa = 4 against Hy: pi = po = p/2, ps = ps 
(1 — p)/2,0<p<I. 


3. Find the GLR test of Ho: p = po against Hi: p # po, based on a sample of 
size | from b(n, p). 


4, Let X1, X2,...,X, be a sample from N(, 02), where both yz and o? are un- 
known. Find the GLR test of Ho: o = oo against Hy: o # oo. 


5, Let Xi, X2,... , X,_ be a sample from the PMF 
1 
PriX = j}= 5 j=1,2,...,N,N > 1 is an integer. 


(a) Find the GLR test of Hp: N < No against H,: N > No. 
(b) Find the GLR test of Hp: N = No against H;: N # No. 


6. For a sample of size 1 from the PDF 
2 
So(x) = gz — *)» O<x <9, 


find the GLR test of 9 = 69 against 0 # 6. 


7. Let X1, X2,..., Xn be a sample from G(1, B). 
(a) Find the GLR test of 8 = fo against B # Bo. 
(b) Find the GLR test of 6B < Bo against B > Bo. 


8. Let (X1, Yi), (X2, Y2), --- , (Xn, Yn) be a random sample from a bivariate nor- 
mal population with EX; = 1, EY; = 2, var(X;) = o7, var(¥;) = o?, 
and cov(X;, Y;) = po’. Show that the likelihood ratio test of the null hypoth- 
esis Hj: p = 0 against H,: p 4 0 reduces to rejecting Hp if |R| > c, where 
R= 28i1/ (S?+53), Si, So and SS being the sample covariance and the sample 
variances, respectively. (For the PDF of the test statistic R, see Problem 7.7.1.) 
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9. Let X1, X2,..., Xm be iid GC, 6) RVs and let ¥;, Y2,... , Yn be iid GC, yp) 
RVs, where 9 and yz are unknown positive real numbers. Assume that the X’s 
and the Y’s are independent. Develop an a-level GLR test for testing Ho: 9 = wu 
against H,: 0 # w. 


10. A die is tossed 60 times in order.to test Hy: P{j} = 1/6, 7 = 1,2,... , 6 (dieis 
fair) against H,: P{2} = P{4) = P{6} = 5 P{1} = P{3) = P{5}= 5. Find 
the GLR test. 


U4. Let X1, X2,..., Xp be iid with the common PDF fo(x) = exp{[—(@ — 4)], 
x > 0 and = 0 otherwise. Find the level a GLR test for testing Ho: @ < 00 
against Hy: 0 > 6p. 


12. Let X;, X2,... , Xn be iid RVs with the common Pareto PDF fo (x) = 6/x* for 
x > 0, and = 0 elsewhere. Show that the family of joint PDFs has MLR in X(1) 
and find a size a test of Ho: 8 = 0p against H, : 6 > 60. Show that the GLR test 
coincides with the UMP test. 


10.3 CHI-SQUARE TESTS 


In this section we consider a variety of tests where the test statistic has an exact 
or a limiting chi-square distribution. Chi-square tests are also used for testing some 
nonparametric hypotheses and are taken up again in Chapter 13. 

We begin with tests concerning variances in sampling from a normal population. 
Let X1, X2,... , Xn, be iid N(u, 07) RVs where o” is unknown. We wish to test a 
hypothesis of the type o? > 2, 0? < of, or o* = of, where oo is some given 
positive number. We summarize the tests in the following table: 


Reject Hp at Level a if: 


Ho A, “u Known uw Unknown 
n 2 2 2 2 5 
L C20) GF <0 YG — Ww)? < x2 10% ee [Xamit-a 
2 OG 
it 
I. <0 F>% Liu — WY? = Xa Serena 
n 2 2 2 2 oy 2 
DiGi — BY? S Xn1-02% ss De penn hal2 
Ill. C= o #09 or or 
2 2 2 2 og 2 
ne 
Li Gi — HY” = Xna2% SO 2 taal 


Remark I. A\ll these tests can be derived by the standard likelihood ratio proce- 
dure. If jz is unknown, tests I and Il are UMP unbiased (and UMP invariant). If 4 
is known, tests I and Il are UMP (see Example 9.4.5). For tests II] we have chosen 
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constants c;, C2 so that each tail has probability a@/2. This is the customary proce- 
dure, even though it destroys the unbiasedness property of the tests, at least for small 
samples. 


Example 1. A manufacturer claims that the lifetime of a certain brand of batteries 
produced by his factory has a variance of 5000 (hours)”. A sample of size 26 has a 
variance of 7200 (hours)*. Assuming that it is reasonable to treat these data as a 
random sample from a normal population, let us test the manufacturer’s claim at the 
a = 0.02 level. Here Ho: 07 = 5000 is to be tested against H,: o” # 5000. We 
reject Ho if either 


2 05 2 2 a 2 
s*~ = 7200 < nop *r-hi-a/2 or Ss > ne [etn ba/2: 
We have 
2 
% 2 5000 
and 
2 
% 2 5000 
geet jXn-1.e/2 = 35" x 44.314 = 8862.8 
Since s* is neither < 2304.8 nor > 8862.8, we cannot reject the manufacturer’s 


claim at the 0.02 level. 


A test based on a chi-square statistic is also used for testing the equality of several 
proportions. Let X1, X2,... , X; be independent RVs with X; ~ b(nj;, pj), i = 
12s. KK 2. 


Theorem 1. The RV 3~_,[(X; —nipi)//ni pil — pil converges in distribu- 
tion to the x2(k) RV as 71, 72,...,N% —> OO. 


The proof is left as an exercise. 


If my, n2,... , mx are large, we can use Theorem | to test Ho: py = po = +--+ = 
Pk = p against all alternatives. If p is known, we compute 


es 3 Xj — nip 
1 Vv ni pti — p) 
and if y > tin we reject Ho. In practice, p will be unknown. Let p = (pt, p2, 
. » Pk). Then the likelihood function is 


k - 
Lp x1,... 4%) = I] [(m orc = pon] 
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so that 


k k k 
as 
log L(p; x) = log (") + } > x; log pi + "(i — x1) log — pi). 
i=l i i=} i=l 
The MLE p of p under Ap is therefore given by 


k k 
1% Liu — xi) 


= 0, 
P 1—p 


that is, 


Xp +X. + +++ + XE 


Pn tng ory 


Under certain regularity. assumptions (see Cramér [16, pp. 426-427]) it can be shown 
that the statistic 


k a\2 
(Xi — nj p) 

1 Y= SE en CL 
” : ni p( — p) 


is asymptotically ~7(k — 1). Thus the test rejects Ho: pi = p2 =--- = Pe = p, 
unknown, at level a if y; > xj_ ie 

It should be remembered that ‘the tests based on Theorem 1 are all large-sample 
tests and hence not exact, in contrast to the tests concerning the variance discussed 
above, which are all exact tests. In the case k = 1, UMP tests of p > po and p < po 
exist and can be obtained by the MLR method described in Section 9.4. For testing 
P = Po, the usual test is UMP unbiased. 

In the case k = 2, if ny and nz are large, a test based on the normal distribution 
can be used instead of Theorem 1. In this case the statistic 


X1/ny — X2/n2 


VBC — pA/n + 1/2)’ 


where p = (X1+X2)/(1+n2) is asymptotically NO, 1) under Ho: p) = p2 = p. 
If p is known, one uses p instead of p. It is not too difficult to show that Zi is eau 
to Y1, so that the two tests are equivalent. 

For small sampies the Fisher—Irwin test is commonly used and is based on the 
conditional distribution of X; given T = X; + X2. Let p = [pil — pa) I/{p21 — 
pi). Then 


f n ; cag n -j t+; 
Pa + == (")p{a- a" ue "Yo 1s py 


j=0 


t 
=>) Cl - ofan, n2) 
j=0 J | 


(2) Z= 
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where 


t 
a(n1,n2) = (1 — pr)"!(1 — pa)” (3) 
— p2 


It follows that 
nmi\ ny—x{ 72 t-x nz—t+x 
_ 1 _ 2 
("")or Pi) (,"",)e% (1 — pa) 
‘fn n\ , 
aim.m > (")( ‘)o! 
j=0 J oJ 
ny n2 x 
3 .4P 
2 ee 
; ; 
n ny ; 
Bue)? 
rad j 


On the boundary of any of the hypotheses p; = p2, pi < p2 or pi > p2, we note 
that p = 1, so that 


()07,) 
x t—x 
P\X, =x|Xy 4+ X2 =t} = —— 
{X1 =x|X) 2=t} Para 

t 
which is a hypergeometric distribution. For testing Ho: p1 < p2 this conditional test 
rejects if X; < k(t) where k(t) is the largest integer for which P{Xx 1<k(T)|T = 
t} < a. Obvious modifications yield critical regions for testing p; = p2, and p; = 
p2 against corresponding alternatives. 


In applications a wide variety of problems can be reduced to the multinomial 
distribution model. We therefore consider the problem of testing the parameters of a 


P{X, =2x|X,+X,=t)= 


multinomial distribution. Let (X,, X2,... , Xx_1) be a sample from a multinomial 
distribution with parameters n, p1, p2,..., Pe—1, and let us write X; =n — X1 — 
+--+ — Xz_}, and py = 1 — py —--- — pr-y. The difference between the model of 


Theorem | and the multinomial model is the independence of the X;’s. 


Theorem 2. Let (X1, X2,... , Xx—1) be a multinomial RV with parameters n, 
Pi, P25.-+ 5 Pk—-1- Then the RV 
k 2 
(Xi — npi) 
(3) ie Se 
ye 


i=] 
is asymptotically distributed as a x7(k — 1) RV (as n —> 00). 


Proof. For the general proof we refer the reader to Cramér [16, pp. 417-419] or 
Ferguson [26, p. 61]. We will consider here the k = 2 case to make the result a little 
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more plausible. We have 


— Xi=npy? | X2—np2)* _ (i= npr)? , (n= Xi — nd - pyP 


U2 
npi np2 np n(1— p) 
1 1 
= (Xi — mpi)? Ee + aoa 
np, n(l— pi) 
_ (Xi — np)? 
npy(1 — pi) 


It follows from Theorem 1 that U2 4 Y asn —> ov, where Y ~ x7( 1). 
To use Theorem 2 to test Ho: pj = pj,--. . Pk = Py, we need only to compute 
the quantity 


k 2 
pes > (xj — np’) 


i NP; 
from the sample; if n is large, we reject Ho if u > Rex es 


Example 2. A die is rolled 120 times with the following results: 


Result 1 2 3 4 5 6 
Frequency: | 20 30 20 25 15 10 


Let us test the hypothesis that the die is fair at level a = 0.05. The null hypothesis 
is Ho: pi = L, i = 1,2,... ,6, where p; is the probability that the face value is i, 
1 <i < 6. By Theorem 2, we reject Ho if 


6 1y\72 
[x; — 120(¢)] % 
He y anes ae > X5,0.05° 
: 120(4) 
We have 


10? Be SSE Ae 
= a — + — + — = 1255. 
BE ag og 20 
Since x5,0.05 = 11.07, we reject Ho. Note that if we choose a = 0.025, then 
X5,0.025 = 12.8, and we cannot reject at this level. 


Theorem 2 has much wider applicability, and we will later study its application 
to contingency tables. Here we consider the application of Theorem 2 to testing the 
null hypothesis that the DF of an RV X has a specified form. 


Theorem 3. Let X;, X2,... , X, be a random sample on X. Also, let Hp: X ~ 
F, where the functional form of the DF F is known completely. Consider a collec- 
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tion of disjoint Borel sets Aj, A2,... , Ax that form a partition of the real line. Let 
P{X € Aj} = pj, i = 1,2,...,k, and assume that p; > 0 for each i. Let Y; = 
number of X;’s in Aj, j = 1,2,...,k,i = 1,2,...,n. Then the joint distribution 
of (Y;, Y2,.-.. , Yg_1) is multinomial with parameters n, pj, p2,.-. , Pk—1- Clearly, 
Ye =n —¥y —---— Yx_y and py = 1 — py —---— pr-1- 


The proof of Theorem 3 is obvious. One frequently selects Ay, Az,..., Ax as 
disjoint intervals. Theorem 3 is especially useful when one or more of the parameters 
associated with the DF F are unknown. In that case the following result is useful. 


Theorem 4. Let Ho: X ~ Fg, where @ = (6), 62,... ,6,) is unknown. Let 
X 1, X2,...,X» be independent observations on X, and suppose that the MLEs of 
61, 02,... , 0, exist and are, respectively, 61, b, ats , 6. Let A1, A2,..., Ay be a 
collection of disjoint Borel sets that cover the real line, and let 


Pi = Pi{X € Ai} > O P= 2.222 3K; 
where @ = (61, Sorc 6), and Pg is the probability distribution associated with Fg. 
Let Y;, Yo,... , ¥% be the RVs, defined as follows: ¥; = number of X;, X2,..., Xn 


in Aj,i=1,2,...,k. 
Then the RV 


i 3 (¥; — ni)? 
k= ae 
n=] ' 


is asymptotically distributed as a x2(k —r — 1) RV (as n > 0). 
The proof of Theorem 4 and some regularity conditions required on Fg are given 


in Rao [86, pp. 391-392]. 
To test Hp: X ~ F, where F is completely specified, we reject Ho if 


k 2 
(yi — npi) 2 
“= + > xf]. 
> npi k—1,a 


provided that n is sufficiently large. If the null hypothesis is Hp: X ~ Fe, where Fa 


is known except for the parameter 8, we use Theorem 4 and reject Hp if 


k—r—l,a? 


where r is the number of parameters estimated. 


Example 3. The following data were obtained from a table of random numbers 
of normal distribution with mean 0 and variance 1. 
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0.464 0.137 2.455 -—0.323 -—0.068 
0.906 -—0.513 -0.525 0.595 0.881 
—0.482 1.678 -—0.057 -—1.229 —0.486 
—1.787 -—0.261 1.237 1.046 —0.508 


We want to test the null hypothesis that the DF F from which the data came is 
normal with mean 0 and variance 1. Here F is completely specified. Let us choose 
three intervals (—oo, —0.5], (—0.5, 0.5], and (0.5, 00). We see that Yj = 5, Y2 = 8, 
and Y3 = 7. 

Also, if Z is NO, 1), then p; = 0.3085, p2 = 0.3830, and p3 = 0.3085. Thus 


3 2 
(yi — npi) 
k= — SS 
> npi 
_ 6 —20 x 0.3085)? (8 — 20 x 0.383)? s: (7 — 20 x 0.3085)? 
= 6.17 7.66 6.17 


<1. 
Also, xt 0.05 = 2-99, SO we cannot reject Ho at level 0.05. 


Example 4. In a 72-hour period on a Jong holiday weekend, there was a total of 
306 fatal automobile accidents. The data are as follows: 


Number of Fatal Accidents 


per Hour Number of Hours 
Oorl 4 
2 10 
3 15 
4 12 
5 12 
6 6 
7 6 
8 or more 7 


Let us test the hypothesis that the number of accidents per hour is a Poisson RV. 
Since the mean of the Poisson RV is not given, we estimate it by 


Let us now estimate pj = P:{X =ij),i=0,1,2,..., po= et = 0.0143. Note 
that 


a 


P{X=xt+1} 2 
P{X=x}o x +) 


’ 
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so that pi41 = [A/@ + D1 pj. Thus 


pi = 0.0606, po = 0.1288, f3 = 0.1825, ps = 0.1939, 
ps = 0.1648, fs = 0.1167, 7 = 0.0709, pg = 1 — 0.9325 = 0.0675. 


The observed and expected frequencies are as follows: 


Oorl 2 3 4 5 6 7 8 or More 


Observed frequency, 0; 4 10 15 12 12 6 6 7 
Expected frequency 5.38 9.28 13.14 13.96 11.87 841 5.10 4.86 
= 72p; = e; 


Since we estimated one parameter, the number of degrees of freedom is k ~r—1= 
8 — 1 — 1 = 6. From Table ST3, XZ008 = 12.6, and since 2.74 < 12.6, we cannot 
reject the null hypothesis. 


Remark 2. Any application of Theorem 3 or 4 requires that we choose sets 
Aj, A2,... , Ak, and frequently these are chosen to be disjoint intervals. As a rule 
of thumb, we choose the length of each interval in such a way that the probabil- 
ity P{X e€ A;} under Ho is approximately 1/k. Moreover, it is desirable to have 
n/k > 5 or, rather, ej > 5 for each i. If any of the e;’s is < 5, the corresponding 
interval is pooled with one or more adjoining intervals to make the cell frequency at 
least 5. If any pooling is done, the number of degrees of freedom is the number of 
classes after pooling, minus 1, minus the number of parameters estimated. 


Finally, we consider a test of homogeneity of several multinomial distributions. 
Suppose that we have c samples of sizes 11, 12, ... , Mc from c multinomial distribu- 
tions. Let the associated probabilities with the jth population be (p1j, p2j,--- » Prj) 
where )7)_; Pij = 1, j = 1,2,... ,c. Given observations Njj, i = 1,2,...,r, 
j=1,2,...,¢e with }y_, Njj =nj,j =1,2,... ,c we wish to test Ho: pij = pi, 
for j = 1,2,...,c,i = 1,2,...,r— 1. The case c = 1 is covered by Theorem 2. 
By Theorem 2 for each j, 


U. = x (Nij — nj pi)” 
= njPi 
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has a limiting Xx distribution. Since samples are independent, the statistic 


ws 3 rae (Nij —nj pi)” 


j=li= NjPi 
has a limiting Ke p21) distribution. If p;’s are unknown, we use the MLEs 


Cc 
ot Ni; 

pp eM r,t 
jai" 


for p;, and we see that the statistic 


yea y) y. Cate 


j=li=l njPi 


has a chi-square distribution with c(r — 1) — 7 — 1) = (c ~ Dr — 1) df. We reject 
Ho at (approximate) level a is V,. > Kee 


Example 5. A market analyst believes that there is no difference in preferences of 
television viewers among the four Ohio cities of Toledo, Columbus, Cleveland, and 
Cincinnati. To test this belief, independent random samples of 150, 200, 250, and 
200 persons were selected from the four cities and asked, “What type of program 
do you prefer most: mystery, soap, comedy, or news documentary?” The following 
responses were recorded: 


City 
Program Type Toledo Columbus Cleveland Cincinnati 
Mystery 50 70 85 60 
Soap 45 50 58 40 
Comedy 35 50 72 67 
News 20 30 35 33 
Sample size 150 200 250 200 


Under the null hypothesis that the proportions of viewers who prefer the four 
types of programs are the same in each city, the maximum likelihood estimates of 
pi, i = 1,2, 3, 4 are given by 


, _ S0+70+85+60 _ 265 _ 9, 
Pl = 7504+ 200+ 2504200 800 


. _45+50458+40 193 
So oe ee se a a 
cae 800 = 300 
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ae S08, 
800 800 

5, - 204+30+35+33 118 9 

ional 800 “300° 0° 


Here p; =proportion of people who prefer mystery, and so on. The following table 
gives the expected frequencies under Ho: 


Expected Number of Responses Under Ho 


Program 

Type Toledo ‘ Columbus Cleveland Cincinnati 
Mystery 150 x 0.33 = 49.5 200x033=66 250x0.33=82.5 200 x 0.33 = 66 
Soap 150 x 0.24 = 36 200 x 0.24=48 250 x 0.24=60 200 x 0.24 = 48 
Comedy 150 x 0.28 = 42 200 x 0.28=56 250 x 0.28 = 70 200 x 0.28 = 56 
News. 150 x 0.15 =22.5  200x0.15=30 250x015=37.5 200x 0.15 = 30 
Sample size 150 200 250 200 


It follows that 


_ (50 = 49.5)2 | (45-36)? | 5~42) | (20 ~22.5)? 


MS ASG % tO aR 225 
70 —66)* (50—48)2 (50-56)? (30 — 30) 
+ ( ) . ( ) # ( ) # ( )) 
66 48 56 30 
(85 — 82.5)? (58 — 60) : (72—70)? (35 — 37.5)? 
82.5 60 70 37.5 
(60— 66)? (40—48)?  (67—56)2 (33 — 30)? 
a 566 7 ae alas amie Fame 30 
= 9.37, 


Since c = 4 and r = 4, the number of degrees of freedom is (4 — 1)(4 — 1) = 9 and 
we note that under Ho 


0.30 < P{U44 > 9.37} < 0.50. 
With such a large P-value we can hardly reject Ho. The data do not offer any evi- 
dence to conclude that the proportions in the four cities are different. 
PROBLEMS 10.3 


1. The standard deviation of capacity for batteries of a standard type is known to 
be 1.66 ampere-hours. The following capacities (ampere-hours) were recorded 
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for 10 batteries of a new type: 146, 141, 135, 142, 140, 143, 138, 137, 142, 136. 
Does the new battery differ from the standard type with respect to variability of 
capacity? (Natrella [73, p. 4-1]) 


2. A manufacturer recorded the cutoff bias (volts) of a sample of 10 tubes as fol- 
lows: 12.1, 12.3, 11.8, 12.0, 12.4, 12.0, 12.1, 11.9, 12.2, 12.2. The variability of 
cutoff bias for tubes of a standard type as measured by the standard deviation is 
0.208 volt. Is the variability of the new tube with respect to cutoff bias less than 
that of the standard type? (Natrella [73, p. 4-5]) 


3. Approximately equal numbers of four different types of meters are in service and 
all types are believed to be equally likely to break down. The actual numbers of 
breakdowns reported are as follows: 


Type of Meter 1 2 3 4 
Number of Breakdowns Reported 30 40 33 47 


Is there evidence to conclude that the chances of failure of the four types are not 
equal? (Natrella [73, p. 9-4]) 


4. Every clinical thermometer is classified into one of four categories, A, B, C, D, 
on the basis of inspection and test. From past experience it is known that ther- 
mometers produced by a certain manufacturer are distributed among the four 
categories in the following proportions: 


Category A B Cc D 
Proportion 0.87 0.09 0.03 0.01 


A new lot of 1336 thermometers is submitted by the manufacturer for inspection 
and test and the following distribution into the four categories results: 


Category A BC D 
Number of Thermometers Reported 1188 91 47 10 


Does this new lot of thermometers differ from the previous experience with re- 
gard to proportion of thermometers in each category? (Natrella [73, p. 9-2]) 


ny 


A computer program is written to generate random numbers, X, uniformly in 
the interval 0 < X < 10. From 250 consecutive values the following data are 
obtained: 


X-Value 0-1.99 2-3.99 4-5.99 6-7.99 8-9.99 
Frequency 38 55 54 Al 62 


Do these data offer any evidence that the program is not written properly? 


6. A machine working correctly cuts pieces of wire to a mean length of 10.5 cm 
with a standard deviation of 0.15 cm. Sixteen samples of wire were drawn at 
random from a production batch and measured with the following results (cen- 
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timeters): 10.4, 10.6, 10.1, 10.3, 10.2, 10.9, 10.5, 10.8, 10.6, 10.5, 10.7, 10.2, 
10.7, 10.3, 10.4, 10.5. Test the hypothesis that the machine is working correctly. 


. An experiment consists in tossing a coin until the first head shows up. One hun- 


dred repetitions of this experiment are performed. The frequency distribution of 
the number of trials required for the first head is as follows: 


Number of Trials 1 2 3 
Frequency 40 32 15 


Can we conclude that the coin is fair? 


. Fit a binomial distribution to the following data: 


x 0 1 2 
Frequency 8 46 55 


. Prove Theorem 1. 
10. 


4 5m more 


6 


Die 3 


Three dice are rolled independently 360 times each with the following results. 
Face Value Die 1 Die 2 
1 50 62 
2 48 55 
3 69 61 
4 45 54 
5 71 78 
6 77 50 
Sample size 360 360 


Are all the dice equally loaded? That is, test the hypothesis Ho: pi) = pi2 = 
pi3,i = 1,2,... , 6, where pj, is the probability of getting an i with die 1, and 


So On. 


Independent random samples of 250 Democrats, 150 Republicans, and 100 Inde- 
pendent voters were selected one week before a nonpartisan election for mayor 
of a large city. Their preference for candidates Albert, Basu, and Chatfield were 


recorded as follows. 


Party Affiliation 
Preference Democrat _—_ Republican Independent 
Albert 160 70 90 
Basu 32 45 25 
Chatfield 30 23 15 
Undecided 28 12 20 
Sample size 250 150 150 
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Are the proportions of voters in favor of Albert, Basu, and Chatfield the same 
within each political affiliation? 


12. Of 25 income tax returns audited in a small town, 10 were from low- and middle- 
income families and 15 from high-income families. Two of the low-income fam- 
ilies and four fo the high-income families were found to have underpaid their 
taxes. Are the two proportions of families who underpaid taxes the same? 


13 


A candidate for a congressional seat checks her progress by taking a random 
sample of 20 voters each week. Last week, six reported to be in her favor. This 
week nine reported to be in her favor. Is there evidence to suggest that her cam- 
paign is working? 


14, Let {X11, X21,..., Xr}, ... . (Xtc, X2c,.... Xrc} be independent multino- 
mial RVs with parameters (1, P11, P21,--- s Prids-+- » Ges Ples Pres +++» Pre)s 
respectively. Let X;. = }°5_ Xij and )°4_, nj = n. Show that the GLR test 
for testing Ho: pi; = pj, for j = 1,2,...,c,i =1,2,...,7—1, where p;’s 
are unknown against all alternatives can be based on the statistic 


soo =F () / TTF) 


e 


10.4 ¢-TESTS 
In this section we investigate one of the most frequently used types of tests in statis- 


tics, the tests based on a t-statistic. Let X,, X2,... , X, be a random sample from 
Nu, 0), and, as usual, let us write 


X=n"! en S?=(n—1)7! wee — X)’. 
1 1 


The tests for usual null hypotheses about the mean can be derived using the GLR 
method. In the following table we summarize the results. 


Reject Ho at Level if: 
Ho H, o* Known o* Unknown 
— Co = 
I MS fo Lu > po Se aN oe ie ic 
— oO ~ 
II LL > Mo Le < Lo AS et a X < Mo + ge 
o 


= = 5 
Wl. iL = Lo UF Uo |X — pol = Val [x — Bol = ria 1a/2 
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Remark I. A test based on a t-statistic is called a t-test. The t-tests in I and II 
are called one-tailed tests; the t-test in III, a two-tailed test. 


Remark 2. If a? is known, tests I and II are UMP and test III is UMP unbiased. 
If o? is unknown, the t-tests are UMP unbiased and UMP invariant. 


Remark 3. If n is large, we may use normal tables instead of t-tables. The as- 
sumption of normality may also be dropped because of the central limit theorem. For 
small samples care is required in applying the proper test, since the tail probabili- 
ties under normal distribution and t-distribution differ significantly for small n (see 
Remark 7.4.2). 


Example 1. Nine determinations of copper in a certain solution yielded a sample 
mean of 8.3 percent with a standard deviation of 0.025 percent. Let yz be the mean 
of the population of such determinations. Let us test Ho: = 8.42 against Hy: w < 
8.42 at level a = 0.05. 

Here n = 9, x = 8.3, s = 0.025, Lo = 8.42, and th-1,1-« = —t8,0.05 = — 1.860. 

Thus 


Ry 0.025 
—=t-1.1-a = 8.42 — ———1.86 = 8.4045. 
Lo + Van n—1,l—a 3 
We reject Ho since 8.3 < 8.4045. 
We next consider the two-sample case. Let X1, X2,..., Xm and Yj, ¥Y2,...,¥, 


be independent random samples from NV (j41, o?) and N (2, of), respectively. Let 
us write 


X=m">7T Xi, Y=n} Dili 
St=(m—-Y)TYTOAU- XY, = S$ a@-—Y DMG -Yy, 
and 


st 1)S7 + (n — Sz 
P m+n—2 


Se is sometimes called the pooled sample variance. The following table summarizes 
the two sample tests comparing jz; and 22: 


514 SOME FURTHER RESULTS OF HYPOTHESIS TESTING 


Ho H, Reject Hp at Level a if: 
(6 = known constant) o7, 0? Known o7, 02 Unknown, 0; = o2 
I py — pn <8 Mi~-2>6 xX-¥Y> X—-VY>S+ thin-2Q0 
o? a 1 7 
5+ Za —_ + 2 Spyf — + —- 
mon mon 
We fi-te 28 py—-wa<6 X¥-¥<K X—Y¥ <8 — tan-2,0 
Oo; G. 1 1 
z+ +4 Spf — + 
mon mon 
We pfi-we=8 py -tn FS |X-Y-S|> [¥ —¥ —8| > tatn—2,0/2 


OC; we 11 
mon mon 


Remark 4. The case of most interest is that in which 5 = 0. If OF oF are un- 


known and a2 = o2 = o?, co” unknown, then S2 is an unbiased estimate of o7. 
i 2 P 


In this case all the two-sample t-tests are UMP unbiased and UMP invariant. Before 
applying the t-test, one should first make sure that o? = 04 = 07, 0? unknown. This 


means applying another test on the data. We consider this test in the next section. 


Remark 5. If m+n is large, we use normal tables; if both m and n are large, we 
can drop the assumption of normality, using the CLT. 


Remark 6. The problem of equality of means in sampling from several popula- 
tions will be considered in Chapter 12. 


Remark 7. The two sample problem when o; # o2, both unknown, is com- 


monly referred to as Behrens—Fisher problem. The Welch approximate t-test of 
Ho: [41 = [2 is based on a random number of d.f. f given by 


: eS . 
f= 1+R}) m-1 (1+R)?*n-1 ; 


=. S?/m 
S$/n 


where 


and the t-statistic 


s (X —Y) — (m1 - n2) 


\/ S2/m + S3/n 


T 
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with f d.f. This approximation has been found to be quite good even for small sam- 
ples. The formula for f generally leads to noninteger d.f. Linear interpolation in 
t-tables can be used to obtain the required percentiles for f d.f. 


Example 2. The mean life of a sample of 9 light bulbs was observed to be 1309 
hours with a standard deviation of 420 hours. A second sample of 16 bulbs chosen 
from a different batch showed a mean life of 1205 hours with a standard deviation 
of 390 hours. Let us test to see whether there is a significant difference between the 
means of the two batches, assuming that the population variances are the same (see 
also Example 10.5.1). 

Here Ho: #1 = U2, My: wy # pb2,m = 9,n = 16, xX = 1309, 5) =. 420, 
¥ = 1205, sz = 390, and let us take a = 0.05. We have 


gates 8(420)2 + 15(390)2 
_ 23 
so that 
1 1 8(420)2 + 15(390)2 /1 1 
tm-+n—2,0/25 py = + re 123,0.025y/ ee 9 + io 345.44. 


Since [x — y| = |1309 — 1205| = 104 4 345.44, we cannot reject Ho at level 
a = 0.05. 


Quite frequently, one samples from a bivariate normal population with means 
/41, 442, variances o?, o3, and correlation coefficient p, the hypothesis of interest 
being poy = p22. Let (X1, Y1), (X2, Y2),.-. , (Xn, Yn) be a sample from a bivariate 
normal distribution with parameters 21, 442, o?, o}, and p. Then X; — Y; is Nay - 
p22, 07), where o= of + af — 2pojo2. We can therefore treat Dj = (X; — Yj), 


j=1,2,...,m, as asample from a normal population. Let us write 
_ n d; n BS seal “2 
d= Lid ~ and sj= Lid — ay od : 
n n—1 


The following table summarizes the resulting tests: 


Ao A, 
(do = known constant) Reject Ho at Level a if: 
~ S, 
1. Hi — f2 = do Hy — U2 < dy d Sh 4 haw 
Jn 
= 5, 
I. My — 2 S dy My — 2 > do d>d+ OL See 
Jn 


= AY 
Ill. jy — a = do [1 — br # do |d — do| > Fatn-talr 
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Remark 8. The case of most importance is that in which dp = 0. All the t-tests, 
based on D;’s, are UMP unbiased and UMP invariant. If o is known, one can base 
the test on a standardized normal RV, but in practice such an assumption, is quite 
unrealistic. If is large, one can replace t-values by the corresponding critical values 
under the normal distribution. 


Remark 9. Clearly, it is not necessary to assume that (X1, Y1),... , (Xn, Yn) isa 
sample from a bivariate normal population. It suffices to assume that the differences 
D; form a sample from a normal population. 


Example 3. Nine adults agreed to test the efficacy of a new diet program. Their 
weights (pounds) were measured before and after the program and found to be as 
follows: 


Participant 
1 2 3 4 5 6 7 8 9 


Before 132 139 126 114 122 132 142 119 126 
After 124 141 118 116 114 132 145 123 121 


Let us test the null hypothesis that the diet is not effective, Ho: wi — v2 = 9, 
against the alternative, H,: 41 ~ 2 > O, that it is effective at level a = 0.01. We 
compute 


si = 26.75, and sq =5.17. 


Sd 5.17 5.17 
dg + —=tn-1.¢ = 0+ ——1t, = —— x 2.896 = 4.99 
lo + Sa n—1,00 + WG 8,0.01 3 


Since d ¥ 4.99, we cannot reject hypothesis Ho that the diet is not very effective. 


PROBLEMS 10.4 


1. The manufacturer of a certain subcompact car claims that the average mileage 
of this mode] is 30 miles per gallon of regular gasoline. For nine cars of this 
model driven in an identical manner, using 1 gallon of regular gasoline, the mean 
distance traveled was 26 miles with a standard deviation of 2.8 miles. Test the 
manufacturer’s claim if you are willing to reject a true claim no more than twice 
in 100. 


2. The nicotine contents of five cigarettes of a certain brand showed a mean of 21.2 
milligrams with a standard deviation of 20.05 milligrams. Test the hypothesis 
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that the average nicotine content of this brand of cigarettes does not exceed 19.7 
milligrams. Use a = 0.05. 


. The additional hours of sleep gained by eight patients in an experiment with a 


certain drug were recorded as follows: 


Patient 1 2 3 4 5 6 7 8 


Hours Gained | 0.7 ~—1.1 3.4 0.8 2.0 0.1 —0.2 3.0 


Assuming that these patients form a random sample from a population of such 
patients and that the number of additional hours gained from the drug is a normal 
random variable, test the hypothesis that the drug has no effect at level a = 0.10. 


. The mean life of a sample of 8 light bulbs was found to be 1432 hours with a 


standard deviation of 436 hours. A second sample of 19 bulbs chosen from a 
different batch produced a mean life of 1310 hours with a standard deviation 
of 382 hours. Making appropriate assumptions, test the hypothesis that the two 
samples came from the same population of light bulbs at level a = 0.05. 


A sample of 25 observations has a mean of 57.6 and a variance of 1.8. A fur- 
ther sample of 20 values has a mean of 55.4 and a variance of 20.5. Test the 
hypothesis that the two samples came from the same normal population. 


. Two methods were used in a study of the latent heat of fusion of ice. Both method 


A and method B were conducted with the specimens cooled to —0.72°C. The 
following data represent the change in total heat from —0.72°C to water, 0°C, in 
calories per gram of mass: 


Method A: 79.98, 80.04, 80.02, 80.04, 80.03, 80.03, 80.04, 79.97, 80.05, 
80.03, 80.02, 80.00, 80.02 
Method B: 80.02, 79.74, 79.98, 79.97, 79.97, 80.03, 79.95, 79.97 


Perform a test at level 0.05 to see whether the two methods differ with regard to 
their average performance. (Natrella [73, p. 3-23]) 


- In Problem 6, if it is known from past experience that the standard deviations of 


the two methods are o4 = 0.024 and og = 0.033, test the hypothesis that the 
methods are same with regard to their average performance at level a = 0.05. 


. During World War II bacterial polysaccharides were investigated as blood 


plasma extenders. Sixteen samples of hydrolyzed polysaccharides supplied by 
various manufacturers in order to assess two chemical methods for determining 
the average molecular weight yielded the following results: 


Method A: 62,700; 29,100; 44,400; 47,800; 36,300; 40,000; 43,400; 35,800; 


33,900; 44,200; 34,300; 31,300; 38,400; 47,100; 42,100; 42,200 


Method B: 56,400; 27,500; 42,200; 46,800; 33,300; 37,100; 37,300; 36,200; 


35,200; 38,000; 32,200; 27,300; 36,100; 43,100; 38,400; 39,900 
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Perform an appropriate test of the hypothesis that the two averages are the same 
against a one-sided alternative that the average of method A exceeds that of 
method B. Use a = 0.05. (Natrella [73, p. 3-38]) 


9. The following grade-point averages were collected over a period of 7 years to 
determine whether membership in a fraternity is beneficial or detrimental to 


grades: 
Year 
1 2 3 4 5 6 7 
Fraternity 24 20 23 21 241 20 2.0 


Nonfraternity 24 22 25 24 23 18 1.9 


Assuming that the populations were normal, test at the 0.025 level of significance 
whether membership in a fraternity is detrimental to grades. 


Consider the two-sample t-statistic T = (X — Y)/{S pv ifm + 1/n], where 
Ss? = [(m—- 1)S? +(n— 1) S3]/(m +n — 2). Suppose that 0) # 02. Letm,n —> 


10 


co such that m/(m + ” —> p. Show that under wy = 2,T Les U, where 
U ~ N(O, t”) with = [(1 — p)o? + po3)/[0? + (1 — p)oF]. Thus when 
men, p® 5 and t? © 1, and T is approximately A/(0, 1) as m(* n) — oo. 
In this case, a t-test based on T will have approximately the right level. 


10.5 F-TESTS 
The term F-tests refers to tests based on an F-statistic. Let X;, X2,..., Xm and 
Y,, Yo,..- , Yn be independent samples from N (1, a?) and N (2, of), respec- 


tively. We recall that )-7"(X;-X)/o? ~ x2(m—1) and 7 (%) ~Y)*/o? ~ x?2(n—-1) 
are independent RVs, so that the RV 


FRY) = LAA? AO -Y _ oe 
, Yi(%; —¥)? of(m—-1) of S3 


1s distributed as F(m — 1,n — 1). 
The following table summarizes the F-tests: 


Reject Ho at Level a if: 


Ay A, [41, 2 Known 1, 42 Unknown 
2 2 
Xj m AY 
I. of = of oF > of art c - 2 i = Hy 2 — Frain + = Fr~1 n-l,a 
LiGi- fa) n Sz 
y n sc 
I oO, 20 a} <o? 2G = Hay ie 2 Fama 3 > Fy-im-1.0 


yi — wi)? Gi — wy)? ~ m "* s? 
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YG -— a)’ > "Pp ; s? : 
RON ee ne —- > feng t 

Wi. of=o7 af #0? } LiQi-my 2A BO ee 
or < = Fn. a2 or < Fetn—1.1-a/2 


Remark I. Recall (Remark 7.4.5) that 
Fin.n,l—c a lFeacot 


Remark 2. The tests described above can easily be obtained from the likelihood 
ratio procedure. Moreover, in the important case where 1, {42 are unknown, tests I 
and II are UMP unbiased and UMP invariant. For test IIT we have chosen equal tails, 
as is customarily done for convenience even though the unbiasedness property of the 
test is thereby destroyed. 


Example 1 (Example 10.4.2 continued). In Example 10.4.2 let us test the 
validity of the assumption on which the t-test was based, namely, that the two pop- 
ulations have the same variance at level 0.05. We compute s?/s3 = (420/390)? = 
196/169 = 1.16. Since Fin—in—1,0/2 = F8,i5,0.025 = 3.20, we cannot reject 
Ho: 01 = 02. 


An important application of the F-test involves the case where one is testing the 
equality of means of two normal populations under the assumption that the variances 
are the same, that is, testing whether the two samples come from the same population. 
Let X1, X2,...,Xm and ¥1, Yo,... , ¥, be independent samples from NV (111, a?) 
and N (x22, of), respectively. If oF = of but is unknown, the f-test rejects Ho: 1) = 
[42 if |T| > c, where c is selected so that a2 = P{|T| > c | 1 = “2,01 = 07}, that 


iS, C = tm4n—2,a2/2Sp/ (1/m + 1/n), where 


Zu (m — 1)s? + (n — 1)s2 
“a m+n—-2 


> 


$1, 82 being the sample variances. If first an F-test is performed to test 01 = 02, 
and then a f-test to test 44) = 42 at levels a; and a2, respectively, the probability of 
accepting both hypotheses when they are true is 


P{|T| <c,c, < F < c2|W1 = 2, 01 = 0}; 


and if F is independent of T, this probability is (1 — a@;)(1 — @72). It follows that the 
combined test has a significance level a = 1 — (1 — a1)(1 — a2). We see that 


a= ay + a2 —-ayan <a + a2 
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anda > max(q), a2). In fact, a will be closer to wa; + a2, since for small a; and a2, 
a2 Will be closer to 0. 

We show that F is independent of J whenever oj = 92. The statistic V = 
(X,Y, O1(X%: — X)? + L7(%; — ¥)*) is a complete sufficient statistic for the pa- 
rameter (11, [42, 0] = 02) (see Theorem 8.3.2). Since the distribution of F does not 
depend on 41, 42, and 01 = 0, it follows (Problem 5) that F is independent of V 
whenever.o1 = 02. But T is a function of V alone, so that F must be independent of 
T also. 

In Example 1, the combined test has a significance level of 


a = 1 — (0.95)(0.95) = 1 — 0.9025 = 0.0975. 


PROBLEMS 10.5 


1. For the data of Problem 10.4.4, is the assumption of equality of variances on 
which the t-test is based, valid? 


2. Answer the same question for Problems 10.4.5 and 10.4.6. 


3. The performance of each of two different dive-bombing methods is measured a 
dozen times. The sample variances for the two methods are computed to be 5545 
and 4073, respectively. Do the two methods differ in variability? 


4. In Problem 3, does the variability of the first method exceed that of the second 
method? 


§. Let X = (X 1, X2,..., Xn) be a random sample from a distribution with PDF 
(PMF) f(x, 6), @ € © where © is an interval in R,. Let T(X) be a complete 
sufficient statistic for the family { f(x; 0): @ € O}. If UCX) is a statistic (not a 
function of T alone) whose distribution does not depend on @, show that U is 
independent of T. 


10.6 BAYES AND MINIMAX PROCEDURES 


Let X,, X2,... , X, bea sample from a probability distribution with PDF (PMF) fo, 
@ € ©. In Section 8.8 we described the general decision problem, namely, once the 
statistician observes x, she has a set A of options available. The problem is to find 
a decision function d that minimizes the risk R(@,5) = EgL(6, 5) in some sense. 
Thus a minimax solution requires the minimization of max R(@, 5), while a Bayes 
solution requires the minimization of R(z,5) = ER(@, 4), where x is the a priori 
distribution on ©. In Remark 9.2.1 we considered the problem of hypothesis-testing 
as a special case of the general decision problem. The set A contains two points, ao 
and a1; ag corresponds to the acceptance of Ho: 8 € Qo, and a, corresponds to the 
rejection of Ho. Suppose that the loss function is defined by 
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L(@, a9) = a(@) f#d9¢@,, a(@)>O0, 
L(@, a\) = b@) if6E€@o, b()>09, 
L(@, ao) =9 if 0 € Oo, 
L(@,a;) =0 if6 €Q,. 


(1) 


Then 


(2) R(O, 5(X)) = LO, ao) Po{5(X) = an} + LG, a1) Po{S(K) = ay} 


a(6) Pe{5(X) = ao} if9 € ©), 


(3) =e ; 
(0) Po{5(X) = ay} if 6 € Qo. 


A minimax solution to the problem of testing Ho: 6 € @o against Hi: 0 € OQ, 
where © = Oo + 1, is to find a rule 6 that minimizes 


max[a(9) Po{5(X) =ao}, (6) Pa{5(X) = ay}). 


We will consider here only the special case of testing Ho: 6 = 9 against H,: @ = 
6,. In that case we want to find a rule 6 that minimizes 


(4) max[a Po, {5(X) = ao}, b Pa, {5(X) = ay}I. 
We will show that the solution is to reject Ho if 


So, (x) a 
fox) ~ ° 


provided that the constant k is chosen so that 


(5) 


(6) R(6o, §(X)) = R@1, 5(X)), 


where 4 is the rule defined in (5); that is, the minimax rule 6 is obtained if we choose 
k in (5) so that 


7) a Po, {5(X) = ao} = b Pe, {6(X) = ai}, 
or, equivalently, we choose k so that 


fo,(X) | fo, (X) 
P, k} = bP, —— >k}. 
oa {Fe00 baad (oma Re ofa 


Let 5* be any other rule. If R(@p, 5) < R(@o, 6*), then R(@o,5) = R(01,6) < 
max[R(@, 5*), R(0,, 6*)] and 6* cannot be minimax. Thus R(@, 6) > R(@p, 5*), 
which means that 


(8) 


(9) Poy {8*(X) = a1} < Poy(S(X) = a1} = Plreject Ho | Ho true}. 
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By the Neyman—Pearson lemma, rule 6 is the most powerful of its size, so that its 
power must be at least that of 5*, that is, 


Po, {5(X) = ay} > Po, (8*(X) = ay} 
so that 

Po, {8(X) = ao) < Po, {8*(X) = ao}. 
It follows that 

a Po, {5(X) = ao} < a Po, {5*(X) = ao} 
and hence that 
(10) R(6;, d) < R(61, 5*). 
This means that 
max[R(@p, 5), R(@1, 5)] = R(61, 5) < R(H, 6") 
and thus 
max[R(6, 5), R(@1,5)] < max[R(Op, 5*), R(O1, 8*)). 


Note that in the discrete case one may need some randomization procedure in 
order to achieve equality in (8). 


Example 1. Let X,, X2,...,Xn be iid N(u, 1) RVs. To test Ho: w = po 
against Hi: 2 = [41 (> Ho), we should choose k so that (8) is satisfied. This is the 
same as choosing c, and thus k, so that 


aPy, {X <c} = bPup{X > c} 


or 


K-m c-m] X— po _ ¢— MO 
0 PA « Ja = P| Ja = Tym | 


Thus 
ab[J/n(c — w1)] = b{l — BL /n(c ~ wo) ]}, 


where ® is the DF of an (0, 1) RV. This can easily be accomplished with the help 
of normal tables once we know a, b, tuo, 41, and n. 


We next consider the problem of testing Hp: 9 € ©po against 11: 0 € ©; froma 
Bayesian point of view. Let 2 (0) be the a priori probability distribution on ©, Then 
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(1 R(x, 8) = Eo R(O, 5(X)) 
fo R@, 6)x(6)de if 2 is a PDF, 
Ye RO, 5)x(0) if 7 is a PMF, 
So b(@)1 (0) Po {5(X) = ay}dO+ 
to, a(0)z (0) Po{5(X) = an}dé if z is a PDF, 


Yo, (0) (0) Po{S(X) = a}+ 
Le, 4(0)™@)Po{5(X) = ao} ifm isa PMF. 


The Bayes solution is a decision rule that minimizes R(z, 5). In what follows we 
restrict our attention to the case where both Ho and H, have exactly one point each, 
that is, @p = {09}, ©; = {01}. Let r(09) = mo and 7(01) = 1 — m9 = my. Then 


(12) R(x, 5) = bro Poy {5(X) = ay} + amy Po, {8(X) = ao}, 
where b(69) = b, a(6,;) = a; (a, b > 0). 

Theorem 1. Let X = (X, X2,... , Xn) be an RV of the discrete (continuous) 
type with PMF (PDF) fo, 9 € © = {6, 0;}. Let (0p) = m0, 27(01) = 1—a9 =] 
be the a priori probability mass function on ©. A Bayes solution for testing Hp: X ~ 


foo against H;: X ~ fo,, using the loss function (1), is to reject Hp if 


fo(X) . bro 


13 : 
— foo(*) ~ any 


Proof. We wish to find 6 that minimizes 
R(x, 5) = bro Pa {5(X) = a1} + any Po, (5(X) = ao}. 
Now 


R(x, 6) = EgR(O, 5) 
= E{E9{L@, 5)|X}}, 


so it suffices to minimize Eg {L(@, 5)|X}. 
The a posteriori distribution of # is given by 


(8) fo(x) 
Yo fo(x)2(0) 
_ (9) fox) 
~~ 10 fo (x) + m1 fo, (*) 


(14) h(O|x) = 
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10 fy (X) : 
—_—-->>———— f 0 = 0, 
a 70 f(x) + 71 fo, (X) : : 
1 fo, (x) ifo = 6). 


10 fay (&) + 71 fo, (X) 
Thus 


bh(Oo|x), 9 = 6,5(X) =a, 


Eo{L(O, 8(X))|X = x} = pias 0 = 01, 5(X) = a, 


It follows that we reject Ho, that is, 5(X) = a, if 
bh(@o|x) < ah(1|x), 
which is the case if and only if 


bx foy (x) < a7) fo, (x), 


as asserted. 


Remark I. In the Neyman-Pearson lemma we fixed Pg, {5(X) = a1}, the prob- 
ability of rejecting Ho when it is true, and minimized P9,{8(X) = ao}, the proba- 
bility of accepting Ho when it is false. Here we no Jonger have a fixed level a for 
Po, {5(X) = ai}. Instead, we allow it to assume any value as long as R(x, 6), defined 
in (12), is minimum. 


Remark 2. It is easy to generalize Theorem 1 to the case of multiple deci- 
sions. Let X be an RV with PDF (PMF) fg, where @ can take any of the k values 
1, 62,... , @%. The problem is to observe x and decide which of the 0,’s is the 
correct value of @. Let us write H;: 6 = 06;,i = 1,2,...,k, and assume that 
n(0;) = j,i = 1,2,...k, si xa; = 1, is the prior probability distribution on 
© = {0;, 02,... , &}. Let 


1 if 5 chooses 6;, j # i. 


L(6;,6) = 
(61,9) 0 if 5 chooses 6;. 


The problem is to find a rule 5 that minimizes R(z, 5). We leave the reader to show 
that a Bayes solution is to accept H;: 6 = 6; (i = 1,2,... ,k)if 

(15) 1; fo,(%) = 1; fo, (x) for all j Ai, 7 =1,2,...,k, 

where any point lying in more than one such region is assigned to any one of them. 


Example 2. Let X1, X2,...,Xn be iid N(u, 1) RVs. To test Ho: w = bo 
against Hi: uw = pt (> Uo), let us take a = Db in the loss function (1). Then 
Theorem 1 says that the Bayes rule is one that rejects Ho if 
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So, (x) s 70 
fag(x) ~ 1-19" 


that is, 


> 


exp = Tom 


_ZXi@i = m)* | Li = Ho)? m0 
Z 3 


and 


n(u2 — U2) 10 
exp — Mo) 90 xi + a am > 
i 


1—179 


This happens if and only if 


i< 1 lo: 1- 

2S xi Sea g[70/(1 — z0)] ~ Ho + 1 

ne n H41 — Lo 2 
where the logarithm is to the base e. It follows that, if 79 = 5, the rejection region 
consists of 


Example 3. This example illustrates the result described in Remark 2. Let 
X1, X2,...,Xn be a sample from A(z, 1), and suppose that yz can take any one 
of the three values j41, (42, or 43. Let wy < 2 < 43. Assume, for simplicity, that 
WW, = Nz = 13. Then we accept H;: w = wi, i = 1, 2, 3, if 


n — y-y2 nt aye 
eiekp Ee ma ]ex0 I-33 (xk ae 


k=} k=1 
for each j Ai, j = 1,2, 3. 


It follows that we accept H; if 


we pe , « - 
(Hi Sepp 20. jJ=1,2,3 (j #1), 


that is, 


(Mi — ji) (Hi + Hy) 


Thus the acceptance region of H, is given by 
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yo hth and yo Me 


Also, the acceptance region of H2 is given by 


rent iene and x < Maths 
2 2 
and that of H3 by 
yp ee mer yp Bae 


In particular, if 21 = 0, 2 = 2, w3 = 4, we accept Hy ifx < 1, Hp if1 <x <3, 
and H3 if x > 3. In this case, boundary points 1 and 3 have zero probability, and it 
does not matter where we include them. 


PROBLEMS 10.6 


1. In Example 1, letn = 15, wo = 4.7, and x1 = 5.2, and choose a = b > 0. Find 
the minimax test and compute its power at z = 4.7 and yw = 5.2. 


2. A sample of five observations is taken on a b(1, 6) RV to test Ho: 0 = 5 against 
Hy: 6 =}. 
(a) Find the most powerful test of size a = 0.05. 
(b) If L(4, 4) = LG, 32) = 0, LG, 3) = 1, and LG, 5) = 2, find the minimax 
tule. 
(c) If the prior probabilities of 9 = 5 and @ = 3 are 19 = ; and 7, = z, 
respectively, find the Bayes rule. 


3. A sample of size n is to be used from the PDF 
fo(x) =0e°%, x>0, 


to test Ho: 0 = 1 against H): 0 = 2. If the a priori distribution on @ is 7p = 2, 
m= i, and a = b, find the Bayes solution. Find the power of the test at@ = 1 
and 6 = 2. 


4. Given two normal densities with variances 1 and with means —1 and 1, respec- 
tively, find the Bayes solution based on a single observation when a = b and 
(a) m = 7 = 5, and (b) 779 = tm = 3. 

5. Given three normal densities with variances 1 and with means —1, 0, 1, respec- 
tively, find the Bayes solution to the multiple decision problem based on a single 
observation when 21 = z, = 2, R= 7 


6. For the multiple decision problem described in Remark 2, show that a Bayes 
solution is to accept H;: 6 = 6; (i = 1,2,... , &) if (15) holds. 


CHAPTER Il 


Confidence Estimation 


11.1 INTRODUCTION 


In many problems of statistical inference the experimenter is interested in construct- 
ing a family of sets that contain the true (unknown) parameter value with a specified 
(high) probability. If X, for example, represents the length of life of a piece of equip- 
ment, the experimenter is interested in a lower bound @ for the mean @ of X. Since 
@ = 0(X) will be a function of the observations, one cannot ensure with probabil- 
ity | that 9(X) < 6. All that one can do is to choose a number 1 — a that is close to 1 
so that Po{9(X) < 0} > 1—«a for all 6. Problems of this type are called problems of 
confidence estimation. In this chapter we restrict ourselves mostly to the case where 
© C FR and consider the problem of setting confidence limits for the parameter 6. 

In Section 11.2 we introduce the basic ideas of confidence estimation. Sec- 
tion 11.3 deals with various methods of finding confidence intervals, while Sec- 
tion 11.4 deals with shortest-length confidence intervals, In Section 11.5 we study 
unbiased and equivariant confidence intervals. 


11.2 SOME FUNDAMENTAL NOTIONS OF 

CONFIDENCE ESTIMATION 
So far we have considered a random variable or some function of it as the basic 
observable quantity. Let X be an RV, and a, b be two given positive real numbers. 


Then 


Pla < X <b} = P{a < X and X < b} 


=P{= > band x <6} 
a 
=P{x<b<7*}. 

a 


and if we know the distribution of X and a,b, we can determine the probability 
P{a < X < b}. Consider the interval 1(X) = (X,bX/a). This is an interval with 


527 


528 CONFIDENCE ESTIMATION 


endpoints that are functions of the RV X, and hence it takes the value (x, bx/a) 
when X takes the value x. In other words, /(X) assumes the value J (x) whenever X 
assumes the value x. Thus /(X) is a random quantity and is an example of a random 
interval. Note that 1(X) includes the value b with a certain fixed probability. For 
example, ifb = 1,a = 5 and X is U(0, 1), the interval (X, 2X) includes point 1 with 
probability 4. We note that /(X) is a family of intervals with associated coverage 
probability PUI(X) 31) = ;- It has (random) length /(1(X)) = 2X — X = X.In 
general, the larger the length of the interval, the larger the coverage probability. Let 
us formalize these notions. 


Definition 1. Let Pg, 8 € © C Rex, be the set of probability distributions of 
an RV X. A family of subsets $(x) of ©, where S(x) depends on the observation x 
but not on @, is called a family of random sets. ¥f, in particular, © C R and S(x) 
is an interval (@(x), 6(x)), where @(x) and @(x) are functions of x alone (and not 
6), we call S(X) a random interval with @(X) and 6(X) as lower and upper bounds, 
respectively. 0(X) may be —oo, and @(X) may be ++oo. 


In a wide variety of inference problems, one is not interested in estimating the 
parameter or testing some hypothesis concerning it. Rather, one wishes to establish 
a lower or an upper bound, or both, for the real-valued parameter. For example, if X 
is the time to failure of a piece of equipment, one may be interested in a lower bound 
for the mean of X. If the RV X measures the toxicity of a drug, the concern is to find 
an upper bound for the mean. Similarly, if the RV X measures the nicotine content 
of a certain brand of cigarettes, one may be interested in determining an upper and a 
lower bound for the average nicotine content of these cigarettes. 

In this chapter we are interested in the problem of confidence estimation, namely, 
that of finding a family of random sets 5(x) for a parameter @ such that for a given 
a,0 <a < | (usually small), 


(1) Pe{S(X) 2 0} >1l-a for all @ € ©. 
We restrict our attention mainly to the case where 0 € © C R. 

Definition 2. Let 6 ¢ © C R and 0 <a < 1. A function @(X) satisfying 
(2) P{Q(X) <0} >1-—a  forall@ 
is called a lower confidence bound for @ at confidence level 1 — a. The quantity 
(3) Hes Po{O(X) < 9} 
is called the confidence coefficient. 


Definition 3. A function @ that minimizes 


(4) Po{O(X) < 0’} for all 6’ < 6 
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subject to (2) is known as a uniformly most accurate (UMA) lower confidence bound 
for 9 at confidence level 1 — a. 


Remark 1. Suppose that X ~ Pg and (2) holds. Then the smallest probability of 
true coverage, Pe{O(X) < 9) = Pe{[@(X), 00) 5 6} is 1 — a. The probability of 
false (or incorrect) coverage is Po{[@(X), 00) > 6’) = Pe{@(X) < 6'} for a’ < 4. 
According to Definition 3, among the class of all lower confidence bounds satisfying 
(2), a UMA lower confidence bound has the smallest probability of false coverage. 


Similar definitions are given for an upper confidence bound for @ and a UMA 
upper confidence bound. 


Definition 4. A family of subsets S(x) of © C Ry, is said to constitute a family 
of confidence sets at confidence level 1 — a if 


(5) Pe{S(%) > OJ >1-a for all 0 c ©, 


that is, the random set S(X) covers the true parameter value @ with probability 
> 1—a. A lower confidence bound corresponds to the special case where k = 1 and 


(6) S(x) = (6: (x) < @ < oo}; 

and an upper confidence bound to the case where 

(7) S(x) = (8: O(x) > 6 > —oo}. 
If S(x) is of the form 

(8) S(x) = @(x), O(x)) 


we will call it a confidence interval at confidence level 1 — a, provided that 


(9) Po{0(X) <0 <O(X)}>1-—a  forall@, 
and the quantity 
(10) inf Po{@(X) < 6 < 6(X)} 


will be referred to as the confidence coefficient associated with the random interval. 


Remark 2. We write S(X) > @ to indicate that X, and hence S$(X), is random 
here and not @, so the probability distribution referred to is that of X. 


Remark 3. When X = x is the realization, the confidence interval (set) S(x) is 
a fixed subset of 7,. No probability is attached to S(x) itself since neither @ nor 
S(x) has a probability distribution. In fact, either S(x) covers @ or it does not, and 
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we will never know which since @ is unknown. One can give a relative frequency 
interpretation. If (1 -@)-level confidence sets for @ were computed a large number of 
times, a fraction (approximately) 1 —a@ of these would contain the true (but unknown) 
parameter value. 


Definition 5. A family of (1—a)-level confidence sets {.5(x)} is said to be a UMA 
family of confidence sets at level 1 — a if 


Pe{S(X) contains 6’} < Pg{S'(X) contains 0"} 
for all 9 4 6’ and any (1 — a)-level family of confidence sets S’(X). 

Example 1. Let X\, X2,... , Xn be iid RVs, X; ~ N (u, 07). Consider the in- 
terval (X — cy, X + cz). In order for this to be a (1 — @)-level confidence interval, 
we must have 

P{X —c) << X+c}>1-a, 
which is the same as 
Plu-ao<XK <pt+ej}>l—a. 


Thus 


r|-2 Roe yal > 1a 
o o 


Since ./n(X — )/o ~ N(0, 1), we can choose c) and c2 to have equality, namely, 


c xX- c 
P| Svi« Ji < vil = 10 
oO o Oo 
provided that o is known. There are infinitely many such pairs of values (c;, cz). In 
particular, an intuitively reasonable choice is c) = —c2 = c, say. In that case 
c/n 
— = Za/ 25 


and the confidence interval is (X — (o/J/n)za/2, X + (0/./n)za/2). The length of 
this interval is (20/./n)zq/2. Given o and a, we can choose n to get a confidence 
interval of a fixed length. 

If o is not known, we have from 


P{-c.<X-p<ca}>zl-a 
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that 


Siu * 


and once again we can choose pairs of values (c, cz) using a t-distribution with n—1 
d.f. such that 


[peckgpepalens 


p|-24 < Xa og af =1-—da. 


In particular, if we take c) = —c2 = ¢, say, then 


n 
we = tn-1,a/2, 


and (X—(S/./1)tn—1,0/2, X +(S//1)tn—1,02/2), is a (1—a)-level confidence interval 
for yz. The length of this interval is (2S [Jn)tn—1,a/2s which is no longer constant. 
Therefore, we cannot choose n to get a fixed-width confidence interval of level 1 —a. 
Indeed, the length of this interval can be quite large if o is large. Its expected length 


is 
> 2 2 T'(n/2) 
ane ae ES = —tn- Tln—1/2)° 


which can be made as small as we please by choosing n large enough. 


Example 2. In Example 1, suppose that we wish to find a confidence interval for 
o” instead when p is unknown. Consider the interval (c, S2, 2S”), ci,c2 > 0. We 
have 


P{cyS? <o< 287} >1l-a, 


so that 


Since (n — 1)S?/o? is x2(n — 1), we can choose pairs of values (c,, cz) from the 
tables of the chi-square distribution. In particular, we can choose cj, c2 so that 


n— 1 2 n- 1 
= Xn-1,0/2 and 


ag 2 
C1 _ Xn-1,1—a/2° 
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(¢ —1)S?- (n—1)S? 

Ped wa et, Bid een et 

Xn-1,a/2  Xn—-1,1-a/2 

is a (1 — «)-level confidence interval for 02 whenever pz is unknown. If 2 is known, 
then 


Thus 


n(x; — py? 
yee ~ x7). 
i 


Thus we can base the confidence interval on Yi (X; —p)?. Proceeding similarly, we 
get a (1 — aw)-level confidence interval as 


(ie pu)? Sas a) . 


‘i 2 
Xa? Xn,1-a/2 


Next suppose that both jz and 0? are unknown and that we want a confidence set 
for (u, 07). We have from Boole’s inequality 


ex cot = DS 


= S AY 
PAX — eh Lay/2 << X + = hh-1,04/2; 7) 
| vn vn ee 1an/2 Xn-1,1—-a/2 


mt, 25 a. aS 
>1-—P {% + ee ee <porx — See = u| 


= 2 > 2 
—P { oa 2e me < oor ewe Ss = | 
Xn-1,1-e2/2 Xn-1,0/2 


=1—-a)—a, 
so that the Cartesian product, 


Xs 2 8 2.78 (n—1)S? (n — 1)S? 
S(X) = (x — =In-1,0/2, X + eins?) x (S. << naar 
Jn va Xn—1,09/2  Xn-1,1—01/2 


is a (1 — a — @2)-level confidence set for (jt, 07). 


11.3 METHODS OF FINDING CONFIDENCE INTERVALS 


We now consider some common methods of constructing confidence sets. The most 
common of these is the method of pivots.. 
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Definition 1. Let X ~ Pg. A random variable T (X, @) is known as a pivot if the 
distribution of 7 (X, 8) does not depend on @. 


In many problems, especially in location and scale problems, pivots are easily 
found. For example, in sampling from f(x — 0), X(n) — @ is a pivot and so is X-0. 
In sampling from (1/0) f (x/o), a scale family, X(,)/o is a pivot and so is Xq)/o, 
and in sampling from (1/0) f((x — @)/o), a location-scale family, (X — 6)/S,isa 
pivot, and so is (X(2) + Xqy — 26)/S. 

If the DF Fo of Xj; is continuous, then Fg(X;) ~ U[0, 1] and, in case of random 
sampling, we can take 


T(X, 6) =| | Fo(X:), 


i=l 
or 


n 
—log T(X, 6) = — log Fo(Xi) 


i=| 


as a pivot. Since Fo(X;) ~ U[0, 1], — log Fe(Xi) ~ GC, 1) and — }77_, log Fo (Xi) 
~ G(n, 1). It follows that — }°j_, log Fg (Xj) is a pivot. 

The following result gives a simple sufficient condition for a pivot to yield a con- 
fidence interval for a real-valued parameter 6. 


Theorem 1. Let T(X, 0) be a pivot such that for each 6, T(X, @) is a statistic, 
and as a function of 0, T is either strictly increasing or decreasing at each x € Ry. 
Let A C RF be the range of T, and for every X € A and x € Rp, let the equation 
X. = T(x, 8) be solvable. Then one can construct a confidence interval for 6 at any 
level. 


Proof. Let0 <a < 1. Then we can choose a pair of numbers A; (a) and A2(a) 
in A not necessarily unique such that 


(1) Po{rAi(a) < T(K, 6) < Ar(@)} > l-a@ for all 6. 


Since the distribution of 7 is independent of 0, it is clear that 41 and 42 are indepen- 
dent of @. Since, moreover, T is monotone in 9, we can solve the equations 


(2) T(x, 8) =A1(@) and T(x, 6) =A2(@) 
for every x uniquely for 6. We have 
(3) Po{O(X) <0 <6(X)}>1—a@ forall, 


where 0(X) < A(X) are RVs. This completes the proof. 
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Remark J. The condition that 7 = T(x, @) be solvable will be satisfied if, for 
example, T is continuous and strictly increasing or decreasing as a function of 0 
in ©. 


Note that in the continuous case (that is, when the DF of T is continuous) we can 
find a confidence interval with equality on the right side of (1). In the discrete case, 
however, this is usually not possible. 


Remark 2. Relation (1) is valid even when the assumption of monotonicity of T 
in the theorem is dropped. In that case, inversion of the inequalities may yield a set 
of intervals (random set) S(X) in © instead of a confidence interval. 


Remark 3. The argument used in Theorem | can be extended to cover the multi- 
parameter case, and the method will determine a confidence set for all the parameters 
of a distribution. 


Example 1. Let X1, X2,...,Xn ~N(u, a2), where o is unknown and we seek 
a (1 — a)-level confidence interval for 2. Let us choose 


xX-— 
aa 


T(X, w) = 


where x: S* are the usual sample statistics. The RV 7T(X, 2) has Student’s t- 
distribution with n — 1 d.f., which is independent of 4 and TCX, ,2), as a function 
of 44 is monotone. We can clearly choose A; (a), A2(a) (not necessarily uniquely) so 
that 


P{r1(a) < T(X%, pw) < Az(@)} =1l—-—a for all uw. 
Solving 


X-u 


Ai(@) = Jn, 


we get 


aot Pug ny eS 
Bw(X) = X — Fm H(X) = X — Vas 


and a (1 — a)-level confidence interval is 


— Ss — S 
(x _ aoe X— = nt@)) : 


In practice, one chooses A2(a@) = —A1(@) = th-1,0/2- 
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Example 2. Let X1, X2,... , Xn be iid with common PDF 
fo(x) = exp{—(@ — 9)}, x>@ and Oelsewhere. 


Then the joint PDF of X is 
n 
Ff (x; 6) = exp (- 2% “+ ~) Tixq)>0]- 
i= 
Clearly, T (X, 6) = X(1) — 6 is a pivot. We can choose 4; (@), 42(@) such that 
Po {Ai(@) < Xa) - 8 <Ag@)}=1-a forall 
which yields (X(y) —A2(a), X(1) —A1 (@)) as. a (1 —@)-level confidence interval for 0. 


Remark 4. In Example | we chose A42 = —A,, whereas in Example 2 we did 
not indicate how to choose the pair (A1, 42) from an infinite set of solutions to 
Po {A (a@) < T(X, 0) < A2(a)} = 1—a@. One choice is the equal-tails confidence in- 
terval, which is arrived at by assigning probability a /2 to each tail of the distribution 
of 7. This means that we solve 


5 = Po{T(X, 6) < Ay} = P{T(X, 6) > dd}. 


In Example 1, symmetry of the distribution leads to the choice indicated. In Ex- 
ample 2, Y = X(1) — 9 has PDF 


g(y) = nexp(—ny) for y > 0 
so we choose (A;, A2) from 
a 
Po {Xa -—O< Ar} = 3 = Pog {Xa —6> A2}, 

giving A2(a) = (1/n) In(@/2), and A; (a) = —(1/n) In(1—a@/2). Yet another method 
is to choose 41, Az in such a way that the resulting confidence interval has smallest 
length. We discuss this method in Section 11.4. 

We next consider the method of test inversion and explore the relationship be- 
tween a test of hypothesis for a parameter 6 and confidence interval for 6. Consider 


the following example. 


Example 3. Let X1, X2,..., X, be asample from N(u, 6) where oo is known. 
In Example 11.2.1 we showed that 


= 1 = i 
(x = Faas 200» X+ ~zi0/200) 
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is a (1 — @)-level confidence interval for jz. If we define a test g that rejects a value 
of 2 = po if and only if j4o lies outside this interval; that is, if and only if 


Jn |X — Lol és 
SC Ras 


then 


X — nol 
Fn [iH ea] me 


and the test g is a size a test of u = Lo against the alternatives ~~ Uo. 

Conversely, a family of a-level tests for the hypothesis 4 = jo generates a family 
of confidence intervals for jz by simply taking, as the confidence interval for jzo, the 
set of those yu for which one cannot reject ~ = fio. 

Similarly, we can generate a family of a-level tests from a (1 — @)-level lower 
(or upper) confidence bound. Suppose that we start with the (1 — a)-level lower 
confidence bound X — zy (o0/ /n) for yz. Then, by defining a test p(X) that rejects 
i < po if and only if uo < X — za(oo/./n), we get an a-level test for a hypothesis 
of the form ys < Lo. 


Example 3 is a special case of the duality principle proved in Theorem 2 below. 
In the following we restrict attention to the case in which the rejection (acceptance) 
region of the test is the indicator function of a (Borel-measurable) set, that is, we 
consider only nonrandomized tests (and confidence intervals). For notational conve- 
nience we write Ho(69) for the hypothesis Ho: 6 = 9 and (Go) for the alternative 
hypothesis, which may be one- or two-sided. 


Theorem 2. Let A(@o), % € ©, denote the region of acceptance of an a-level 
test of Ho(09). For each observation x = (x1, x2, ... , Xn), let S(x) denote the set 


(4) S(x) = {0: x € A(9), 6 € ©}. 


Then S(x) is a family of confidence sets for 6 at confidence level 1 — a. If, moreover, 
A(69) is UMP for the problem (a, Ho(0), H1(60)), then S(X) minimizes 


(5) Po{S(X) > 0’} for all 9 € A, (0’) 
among all (1 — @)-level families of confidence sets. That is, SCX) is UMA. 
Proof. We have 


(6) S(x) 30 if and only x € A(@), 
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so that 
Po{S(X) > 6} = Po{X € AG)} = 1-a, 

as asserted. 

If S*(X) is any other family of (1 — @)-level confidence sets, let A*(9) = 
{x: S*(x) > 0}. Then 

Po{X © A*()} = Po{S*(X) 3 6} = L—a; 
and since A(99) is UMP for (a, Ho(69), Hi (00)), it follows that 
Po{X € A*(89)} = Pa{X € A(Oo)} for any 6 € Hj (8). 
Hence 
Pa{S*(X) > 90} = Po{X € A(Go)} = Pa{S(X) > Oo} 

for all 6 € Hj (09). This completes the proof. 


Example 4. Let X be an RV of the continuous type with one-parameter exponen- 
tial PDF given by 


fo(x) = exp[Q@)T (x) + S'(x) + DOD], 


where Q(@) is a nondecreasing function of 0. Let Ho: 6 = 69 and H,: 6 < 9. Then 
the acceptance region of a UMP size a test of Ho is of the form 


A(60) = {x: T(x) > c(Oo)}. 
Since for 6 > 0’, 
Po {T (X) < c(6’)} = a = Pe{T(X) < c(@)} < Po {T(X) < c(6)}, 


c(@) may be chosen to be nondecreasing. (The last inequality follows because the 
power of the UMP test is at least a, the size.) We have 


S(x) = {0: x € A(6)}, 


so that S(x) is of the form (—oo, c~!(T(x))) or (—00, c7'!(T(x))], where c~! is 
defined by 


c7'(T(x)) = sup{9: c(0) < T(x)}. 
6 


In particular, if X;, X2,... , Xp, is a sample from 
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1 Xx 
_e@7/0 
f= A* 5 x > 0, 
0, otherwise, 


then T(x) = ber x;; and for testing Ho: @ = 9 against Hi: 8 < 0, the UMP 
acceptance region is of the form 


A(60) = {s ue coo 
i=l 


where c(6p) is the unique solution of 


co Sales 
/ e Ydy=1-a, O<a<l. 
€(6)/0 ("2 ~ VY! 


The UMA family of (1 — @)-level confidence sets is of the form 
S(x) = {6: x € A(@)}. 


In the case n = 1, 


1 x 
69) = 8! —— d S(x) =}|0, ————-—— ]. 
co) = Mtoe (=) and $0) en 
Example 5. Let X;, X2,... , Xn be iid U(O, 9) RVs. In Problem 9.4.3 we asked 
the reader to show that the test 


oe ae aa 
is UMP size a test of 6 = 69 against 9 4 0. Then 
A(8p) = {x: B9a'!” < xin) < 4) 
and it follows that [x(n), xa! "1 is a (1 —a)-level UMA confidence interval for 0. 


The third method we consider is based on Bayesian analysis, where we take into 
account any prior knowledge that the experimenter has about @. This is reflected in 
the specification of the prior distribution 77(@) on ©. Under this setup the claims of 
probability of coverage are based not on the distribution of X but on the conditional 
distribution of 6 given X = x, the posterior distribution of 0. 

Let © be the parameter set, and let the observable RV X have PDF (PMF) fo (x). 
Suppose that we consider 6 as an RV with distribution 7 (@) on ©. Then f(x) can be 
considered as the conditional PDF (PMF) of X, given that the RV @ takes the value 0. 
Note that we are using the same symbol for the RV @ and the value that it assumes. 
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We can determine the joint distribution of X and 6, the marginal! distribution of X, 
and also the conditional distribution of 6, given X = x as usual. Thus the joint 
distribution is given by 


(7) f(x, 9) = 1) fo(x), 
and the marginal distribution of X by 


> 10) fo(x) if 7 is a PMF, 


(8) 80) = | @)folx)de ifm isa PDF. 


The conditional distribution of 0, given that x is observed, is given by 


_ 20) fo) 


9 he 
(9) (6 | x) ata) 


‘ g(x) > 0. 


Given h(@ | x), it is easy to find functions /(x), u(x) such that 
P{ICX) <0 < u(X®)}>1-a, 
where 


_ fF n@ |», 
(10) P(X) <8 < u(X) | X=x} = ieee |x), 


depending on whether h is a PDF or a PMF. 


Definition 2. An interval (/(x), u(x)) that has probability at least 1 —a@ of includ- 
ing @ is called a (1 — a)-level Bayes interval for 0. Also, [(x) and u(x) are called the 
lower and upper limits of the interval. 


One can similarly define one-sided Bayes intervals or (1 — a)-level lower and 
upper Bayes limits. 


Remark 5. We note that under the Bayesian setup, we can speak of the probabil- 
ity that 6 lies in the interval (/(x), u(x)) with probability 1 — @ because / and u are 
computed based on the posterior distribution of 6 given x. To emphasize this distinc- 
tion between Bayesian and classical analysis, some authors prefer the term credible 
sets for Bayesian confidence sets. 


Example 6. Let X1, X2,...,Xn be iid N(u, 1), uw € FR, and let the a priori 
distribution of « be N’(O, 1). Then from Example 8.8.6 we know that A(t | x) is 


n (24, 
n+l n+l 
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Thus a (1 — a)-level Bayesian confidence interval is 


(= _ _2a/2 nx 4 Zal2 ) 
n+1 Jn+iin+1 Jn+1/)' 


A (1 —q@)-level confidence interval for yz (treating yz as fixed) is a random interval 
with value 


= %a/2 — , Zaf/2 
(= Teo ot), 


Thus the Bayesian interval is somewhat shorter in length. This is to be expected since 
we assumed more in the Bayesian case. 


Example 7. Let X1, X2,... , Xn be iid b(1, p) RVs, and let the prior distribution 
on © = (0, 1) be U(O, 1). A simple computation shows that the posterior PDF of p, 
given x, is 


p&i* (1 — phim 


h(p|x) = B(Sixi t+ la— ix tl)’ 
0, otherwise, 


O<p<ti 


Given a table of incomplete beta integrals and the observed value of OSA X;, one 
can easily construct a Bayesian confidence interval for p. 


Finally, we consider some large-sample methods of constructing confidence in- 
tervals. Suppose that T(X) ~ AN(@, v(@)/n). Then 


T(X) -@ Ly 
RETO) : 


where Z ~ N(O0,1). Suppose further that there is a statistic S(X) such that 
S(X) > v(6). Then, by Slutsky’s theorem, 


Jn 


T(X)-O@ 1 
a mene 


and we can obtain an (approximate) (1 — @)-level confidence interval for 6 by invert- 
ing the inequality 
T(X) -6 | Zz 
( Zz ‘ 
ie: <i ea 


Example 8. Let X\, Xo, ... , X, be iid RVs with finite variance. Also, let EX; = 
wand EX? =o? + y*. From the CLT it follows that 


Ae 
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xX - 
KL Z, 
a/J/n 


where Z ~ N(0, 1). Suppose that we want a.(1 — @)-level confidence interval for 


yt when o is not known. Since S = o, for large n the quantity [./n(X — 2)/S] is 
approximately normally distributed with mean 0 and variance 1. Hence, for large n, 
we can find constants c;, c2 such that 


X= 
Pa - 


In particular, we can choose —c = C2 = Zqa/2 to give 


fica} =e 


s RY 
X — —=Za/2, X + —=Z 
(F— eters B+ te) 
as an approximate (1 — a)-level confidence interval for y. 


Recall that if 6 is the MLE of 6 and the conditions of Theorem 8.7.4 or 8.7.5 are 
satisfied (caution: see Remark 8.7.4), then 


—> NO, 1) asn — 00, 


Jn(@-9) 
oO 


where 


~1 
o= E [wee log fo(X) | = 1 
300 I(@) 


Then we can invert the statement 


Po {can « fi<tan} 21a 


to give an approximate (1 — a)-level confidence interval for @. 

Yet another possible procedure has universal applicability and hence can be used 
for large or small samples. Unfortunately, however, this procedure usually yields 
confidence intervals that are much too large in length. The method employs the well- 
known Chebychev inequality (see Section 3.4): 


P [1x ~ EX|< e/var(X)} ST >: 


If 6 is an estimate of 6 (not necessarily unbiased) with finite variance o7(0), then by 
Chebychev’s inequality 
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: P 1 
P {ib <sVe6—op| 5 A 
E 


It follows that 
(6 ~eVE(6—0)? 6+e/ EO — ay?) 


isa[l—(1 /e?)]-level confidence interval for 6. Under some mild consistency con- 


ditions one can replace the normalizing constant ,/ [E (6 — 6)2], which will be some 


function 4(9) of @, by A(6). 
Note that the estimator @ need not have a limiting normal law. 


Example 9. Let X;, X2,..., Xn be iid b(1, p) RVs and it is required to find a 
confidence interval for p. We know that EX = p, and 


var(X) — p(l — p) 
no no 


= i 1 
p [ie n <af P= 1-5, 
n E 
1 


Since p(1 — p) < q, we have 


var(X) = 


It follows that 


= 1 = 1 i 
PAX —~——e < p< X+—-—e} > 1--. 
iin Jn Pe) 
One can now choose ¢ and n or, if n is kept constant at a given number, ¢€ to get 
the desired level. 
Actually, the confidence interval obtained above can be improved somewhat. We 


note that 
> |p — p) 1 
P4|X - ——— 1--—, 
f pl<eé ; | 2 

so that 
~_ 2. &pll ~ p) I 
Pa A a pre, 
n E 

Now 


2 
—_ € 
[X — pl’ < pl - p) 
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2 2 
= & =2 
(1+) o- (+5) p+ <0. 


This last inequality holds if and only if p lies between the two roots of the quadratic 


equation 
2 2 
(: + =) p?- (om 4 p+X =0. 
n n 
The two roots are 


_ IK + (62/m) — 12 + (e2/n)P — AU + ©?2/m) IX 


if and only if 


A 21 + (e2/n)] 
Xx (e2/n) — Vf 4(62 /n)X(1 — X) + (e4/n?) 
a ear roa aaa 2[1 + (e2/n)] 
and 
_ 2K + (62/n) + V 2K + (e2/n)P — 4 + 2/mIX 
(eo 21 + (e2/n) 
X (e?/n) + /4(62/n)X(1 — X) + (4/2) 
~~ 1+ (€2/n) ov 2[1 + (e2/n)} 
It follows that 


1 
Pipi < p< po} > 1— 5. 


Note that when n is large, 


= [xa -X a xa -X 
pie X—e aoe, poreXt+e a. 


as one should expect in view of the fact that X -» p with probability 1 and 


/ [X(1 — X)/n] estimates /[p(1 — p)/n]. Alternatively, we could have used the 
CLT (or large-sample property of the MLE) to arrive at the same result but with ¢ 
replaced by 292. 


Example 10. Let X,, X2,...,Xn bea sample from U(0, 0). We seek a confi- 
dence interval for the parameter @. The estimator 9 = Xj) is the MLE of 6, which 
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is also sufficient for 6. From Example 5, [Xn), @7!/" X(ny] is a (1 — @)-level UMA 
confidence interval for 0. 

Let us now apply the method of Chebychev’s inequality to the same problem. We 
have 


n 
EEO = al” 


and 


2 


~ ot = 92 
Bo(Xe) — 8 =F nt D 


Thus 


IX —0) [m+ Dm +2) 1 
Pi capa) 
| 0 2 a ica e? 


Since Xin) ie 9, we replace @ by Xm) in the denominator, and for moderately 


large n, 
P Xin) — 4| lida ees Ber Sgt. 
Xa) 2 2 
It follows that 
Ji 2 
Xi) — €X@) Xi) + XK QQ) ———_—_—_—_—_— 
( Oe TG EDD. Va + la +2) 


isal — (1/e?) confidence interval for 8. Choosing 1 — (1/e?) = 1 -a, ore = 


1/./a, and noting that 1/./[@ + 1(n+2)] © 1/n for large n, and the fact that 
with probability 1, X(,) < 0, we can use the approximate confidence interval 


1 /2 
(x. X(n) (: E ie *)} 


In the examples given above we see that for a given confidence interval 1 — a, a 
wide choice of confidence intervals is available. Clearly, the larger the interval, the 
better the chance of trapping a true parameter value. Thus the interval (—oo, +00), 
which ignores the data completely, will include the real-valued parameter 6 with 
confidence level 1. However, the larger the confidence interval, the less meaningful 
itis. Therefore, for a given confidence level 1 —a, it is desirable to choose the shortest 


for 0. 
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possible confidence interval. Since the length 6 — @, in general, is a random variable, 
one can show that a confidence interval of level 1—a@ with uniformly minimum length 
among all such intervals does not exist in most cases. The alternative, to minimize 
E9(6 — @), is also quite unsatisfactory. In the next section we consider the problem 
of finding shortest-length confidence interval based on a suitable statistic. 


PROBLEMS 11.3 


1. 


10. 


A sample of size 25 from a normal population with variance 81 produced a mean 
of 81.2. Find a 0.95 level confidence interval for the mean jz. 


. Let X be the mean of a random sample of size n from N(x, 16). Find the small- 


est sample size n such that (X—1, X¥+1) is a 0.90 level confidence interval for f. 


. Let X1, X2,..., Xm and Y;, Y2,..., Y, be independent random samples from 


N (11, 67) and N (22, 02), respectively. Find a confidence interval for 41 — 2 
at confidence level 1 — a when (a) o is known, and (b) o is unknown. 


. Two independent samples, each of size 7, from normal populations with com- 


mon unknown variance o? produced sample means 4.8 and 5.4 and sample 
variances 8.38 and 7.62, respectively. Find a 0.95-level confidence interval for 
41 — 2, the difference between the means of samples 1 and 2. 


. In Problem 3, suppose that the first population has variance oa? and the second 


population has variance of, where both o}, and of are known. Find a (1 — @)- 
level confidence interval for 44; — 2. What happens if both of and af are 
unknown and unequal? 


. In Problem 5, find a confidence interval for the ratio oF joi, both when ,41, 122 


are known and when 14, j42 are unknown. What happens if either 41; or t22 is 
unknown but the other is known? 


. Let X1, X2,... , X, be a sample from a G(1, A) distribution. Find a confidence 


interval for the parameter B with confidence level 1 — a. 


. (a) Use the large-sample properties of the MLE to construct a (1 — a)-level 


confidence interval for the parameter @ in each of the following cases: 
(i) X1, X2,... , Xp is a sample from G(1, 1/6), and (ii) X1, X2,... , Xn is 
a sample from P(@). 


(b) In part (a), use Chebychev’s inequality to do the same. 


For a sample of size 1 from the population 


folx) = 0-2), 0<x <8, 


find a (1 — aw)-level confidence interval for 0. 


Let X1, X2,... , X, be asample from the uniform distribution on N points. Find 
an upper (1 —q@)-level confidence bound for N, based on max(X1, X2,... , Xn). 
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11. In Example 10, find the smallest 7 such that the length of the (1 — @)-level 
confidence interval (Xn), @7'/"X (ny) < d, provided it is known that 0 < a, 
where a is a known constant. 


12. Let X and Y be independent RVs with PDFs Ae~** (x > 0) and we” (y > 0), 
respectively. Find a (1 — «)-level confidence region for (A, 2) of the form 
{(A, 4): AX + WY < ky}. 


13. Let X1, X2,..., Xn be a sample from N(y, a7), where o? is known. Find a 
UMA (1 — @)-level upper confidence bound for jz. 


14. Let X\, X2,... , X, be a sample from a Poisson distribution with unknown pa- 
rameter A. Assuming that 4 is a value assumed by a G(a, B) RV, find a Bayesian 
confidence interval for 2. 


15. Let X1, X2,... , X, be a sample from a geometric distribution with parameter 
6. Assuming that 6 has a priori PDF that is given by the density of a B(a, B) 
RV, find a Bayesian confidence interval for 0. 


16. Let X1, X2,..., Xn be a sample from N(x, 1), and suppose that the a priori 
PDF for yz is U(—1, 1). Find a Bayesian confidence interval for jt. 


11.4 SHORTEST-LENGTH CONFIDENCE INTERVALS 


We have already remarked that we can increase the confidence level simply by taking 
a longer-length confidence interval. Indeed, the worthless interval —oo < 0 < o, 
which simply says that @ is a point on the real line, has confidence level 1. In prac- 
tice, one would like to set the level at a given fixed number 1 — a (0 < a < 1) and, 
if possible, construct an interval as short as possible among all confidence intervals 
with the same level. Such an interval is desirable since it is more informative. We 
have already remarked that shortest-length confidence intervals do not always exist. 
In this section we investigate the possibility of constructing shortest-length confi- 
dence intervals based on simple RVs. The discussion here is based on Guenther [34]. 
Theorem 11.3.1 is really the key to the following discussion. 

Let X1, X2,... , X, be asample from a PDF fo(x), and T(X1, X2,..., Xn, 9) = 
To be a pivot for . Also, let A; = Ay(@), Az = A2(@) be chosen so that 


(1) Pid, < Tg < Ag} =1-a, 
and suppose that (1) can be rewritten as 
(2) P{@(X) <@ <@(X)} =1-a. 


For every Tg, Ai and A2 can be chosen in many ways. We would like to choose 
dy and Az so that @ — @ is minimum. Such an interval is a (1 — @)-level shortest- 
length confidence interval based on Tg. It may be possible, however, to find another 
RV 7; that may yield an even shorter interval. Therefore, we are not asserting that 
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the procedure, if it succeeds, will lead to a (1 — a)-level confidence interval that has 
shortest length among all intervals of this level. For Tg we use the simplest RV that 
is a function of a sufficient statistic and 6. 


Remark 1. An alternative to minimizing the length of the confidence interval 
is to minimize the expected length Eo{0(X) — @(X)}. Unfortunately, this also is 
quite unsatisfactory since, in general, there does not exist a member of the class of 
all (1 — «)-level confidence intervals that minimizes E9{9(X) — @(X)} for all 0. 
The procedures applied in finding the shortest-length confidence interval based on a 
pivot are also applicable in finding an interval that minimizes the expected length. We 
remark here that the restriction to unbiased confidence intervals is natural if we wish 
to minimize Eg[@(X) — @(x)]. See Section 11.5 for definitions and further details. 


Example I. Let X;, X2,... , Xn be sample from NV (yu, a7), where o? is known. 
Then X is sufficient for 2 and take 


X—p 


a/Jn 


Ty (X) = 


Then 


va va 


The length of this confidence interval is (o/./n)(b — a). We wish to minimize L = 
(a /./n)(b — a) such that 


ae = fe 
Ina=P fa Vico] =e [Ro 5 <u <R oF), 
n 


1 b b 
0(b) ~ ®(a) = - en dy = if g(x)dx =1—a. 
a a 


Here ¢ and ®, respectively, are the PDF and DF of an V(0, 1) RV. Thus 


dL _ oa {db \ 
da J/n\da 


and 
db 
g(b) ey g(a) = 0, 
‘a 
giving 
dL o | (a) 1 
da Jnl|g(b) | 
The minimum occurs when g(a) = ¢(b), that is, when a = b or a = —b. Since 


a = b does not satisfy 
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b 
[ g(t)dt=1—-a, 
a 


we choose a = —b. The shortest confidence interval based on T, is therefore the 
equal-tails interval, 


— oO — oOo => 
(x + Zi-a/27 > K +z0n-7) or (x- zal X+en-Z). 


The length of this interval is 2z¢/2(0/./n). In this case we can plan our experiment 
to give a prescribed confidence level and a prescribed length for the interval. To have 
level 1 — a and length < 2d, we choose the smallest n such that 


Co a? 
a2 Ze/2—> OF 


2 
Vn = 2q/272° 


This can also be interpreted as follows. If we estimate 1 by X, taking a sample of 
sizen > Z 2 1n(o? /d?), we are 100(1 — a) percent confident that the error in our 
estimate is at most d. 


Example 2. In Example 1, suppose that o is unknown. In that case we use 


T,(%) = = Vn 


as a pivot. T,, has Student’s t-distribution with n — 1 d.f. Thus 


vio} =P { Ro <u <% oS} 


xX 
l-a=P f < 
We wish to minimize 
S 
= (b-—a)-——= 
( ) Tk 
subject to 


b 
i; fad Hee 


where f,—1(t) is the PDF of T,,. We have 


dL db Ss db 
ae -(Z- i) Si and fn) — fn-1(a) =0 
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giving 
dL ae [Es - 1 — 
da | fn—1(b) Jn 
It follows that the minimum occurs at a@ = —b (the other solution, a = b, is not 


admissible). The shortest-length confidence interval based on 7, is the equal-tails 
interval, 


aoe S — S 
(x ie hele 7, X+ in-tan—z) : 

The length of this interval is 2t,-1,0/2(S/ Jn), which, being random, may be arbi- 
trarily large. Note that the same confidence interval minimizes the expected length 
of the interval, namely, EL = (b — a)cn(o/./n), where cy is a constant determined 
from ES = cpo and the minimum expected length is 2tn—1,0/2¢n(o/4/n). 

Example 3. Let X;, X2,... , Xn be iid N(y, o*) RVs. Suppose that jz is known 
and we want a confidence interval for 7. The obvious choice for a pivot T,,2 is given 


by 


1(Xi — pb)? 


T,2(x) = 2 


which has a chi-square distribution with n d.f. Now 
n Xie 2 
p{a< ow a =1-e, 
o 
so that 
—a@. 


bs | Eiki = wy cote Liki = - 
a 


We wish to minimize 
1 1\< 
L=(--- V(X ~ 2 
(- ;) _.( i~b) 
subject to 


b 
i Sfrlt)dt =1—a, 


where f, is the PDF of a chi-square RV with n d.f. We have 
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dL 1 1db\< 4 
ia 7 (gt Bae) 


and 


db fala) 


da fn(b)’ 


so that 


aL Gs 1 = | fala) “ i ew. 
= E b2 fai] Lm BL)’, 


which vanishes if 


! 1 fr@ 


a> b? f(b)” 


Numerical results giving values of a and b to four significant places of decimals are 
available (see Tate and Klett [111]). In practice, the simpler equal-tails interval, 


(BE —py Vy (Xs 2) 
2 , 2 , 
Xnoe/2 Xn,1—a/2 


may be used. 
If is unknown, we use 


(xi — X)? Ss? 
sl = 
oO 


T,2(X) = @-1) 


as a pivot. T,2 has a x 2(n—1) distribution. Proceeding as above, we can show that the 
shortest-length confidence interval based on T,,2 is ((n — 1)(S*/b), (n — 1)(S?/a)); 
here a and B are a solution of 


Pla<x?(n—1) <b}=1-a 
and 
a? fn—1(a) = b* fr_1(b), 


where fn—1 is the PDF of a x7(n — 1) RV. Numerical solutions due to Tate and 
Klett [111] may be used, but in practice, the simpler equal-tails confidence interval, 


(evs a) 
Xn-1a/2 Xp-11-a/2 


is employed. 
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Example 4. Let X,, X2,..., Xn be a sample from U(0, 0). Then X(q) is suffi- 
cient for 6 with density 


n—1 


fal) = 0 . O<y<8. 
The RV T = Xm)/0 has PDF 
h(t)=nt™, O<t<1. 


Using Tg as pivot, we see that the confidence interval is (X(n)/b, X(n)/a) with length 
L = Xn) (1/a — 1/b). We minimize L subject to 


b 
| nt" dt =b"—a" =1—a. 
a 


Now 
(d—-a)!/"<b<i 


and 


db _y ida 1)\_y iii alee er 
db ™\ ardb BY \ pagnti 


so that the minimum occurs at b = I. The shortest interval is therefore (Xm), 
X (n)/o!/"). Note that 


which is minimized subject to 
b"—a" =1—a, 


where b = 1 and a = a!/". The expected length of the interval that minimizes EL 
is [(1 /o/ ”) — 1)[n6@/(n + 1)], which is also the expected length of the shortest con- 
fidence interval based on Xn). Note that the length of the interval (Xn), @—!/"X (ny) 
goes to 0 asn — oo. 


For some results on asymptotically shortest-length confidence intervals, we refer 
the reader to Wilks [117, pp. 374-376]. 
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PROBLEMS 11.4 


1. 


Let X1, X2,... , X, be a sample from 


e 9) ifx > 6, 


0 otherwise. 


foe =| 


Find the shortest-length confidence interval for @ at level 1 — a, based on a 
sufficient statistic for 0. 


. Let X1, X2,..., X, be a sample from G(1, 0). Find the shortest-length confi- 


dence interval for @ at level 1 — a, based on a sufficient statistic for 0. 


. In Problem 11.3.9, how will you find the shortest-length confidence interval for 


6 at level 1 — a based on the statistic X/0? 


. Let T(X, 6) be a pivot of the form T(X, 6) = T(X) — 0. Show how one can 


construct a confidence interval for 6 with fixed width d and maximum possi- 
ble confidence coefficient. In particular, construct a confidence interval that has 
fixed width d and maximum possible confidence coefficient for the mean yu of 
a normal population with variance 1. Find the smallest size n for which this 
confidence interval has a confidence coefficient > 1 — a. Repeat the above in 
sampling from an exponential PDF 


fulx)=e"™* forx> pw and fy(x)=0 forx <p. 


(Desu [20]) 


. Let X1, X2,..., Xn be a random sample from 


1 - 
fots) = sex (=H), xER, @>0. 
Find the shortest-length (1 — @)-level confidence interval for 9, based on the 
sufficient statistic )~7"_, |Xil. 


In Example 4, let R = Xin) — X(1). Find a (1 — a)-level confidence interval for 
9 of the form (R, R/c). Compare the expected length of this interval to the one 
computed in Example 4. 


. Let X1, X2,...,X, be a random sample from a Pareto PDF fo(x) = 0/x?, 


x > 0, and = 0 for x < 6. Show that the shortest-length confidence interval for 
6 based on Xi) is (X:ya!/", Xqy). (Use 6/ X 1) as a pivot.) 


. Let X1, X2,..., X, be a sample from PDF fo(x) = 1/(@2 —@1),01 < x < 


62, 6; < 42 and = 0 otherwise. Let R = Xm — X1y. Using R/(@2 — 61) as 
a pivot for estimating 62 — 6, show that the shortest-length confidence interval 
is of the form (R, R/c), where c is determined from the level as a solution of 
cn — 1)e —n] +a =0. (Ferentinos [24]) 
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11.5 UNBIASED AND EQUIVARIANT CONFIDENCE INTERVALS 


In Section 11.3 we studied test inversion as one of the methods of constructing con- 
fidence intervals. We showed that UMP tests lead to UMA confidence intervals. In 
Chapter 9 we saw that UMP tests generally do not exist. In such situations we either 
restrict consideration to smaller subclasses of tests by requiring that the test functions 
have some desirable properties, or we restrict the class of alternatives to those near 
the null parameter values. In this section we follow a similar approach in constructing 
confidence intervals. 


Definition 1. A family {S(x)} of confidence sets for a parameter @ is said to be 
unbiased at confidence level 1 — @ if 


(1) Po{SCX) contains 6} > 1-—a@ 
and 
(2) Po{S(X) contains 6’} < 1—a@ forallé, 0° €@, O46’. 


If S(X) is an interval satisfying (1) and (2), we call it a (1 — a)-level unbiased con- 
fidence interval. If a family of unbiased confidence sets at level 1 — a is UMA in 
the class of all (1 — a)-level unbiased confidence sets, we call it a UMA unbiased 
(UMAU) family of confidence sets at level 1 —a. In other words, if S*(x) satisfies (1) 
and (2) and minimizes 


Po{S(X) contains 6’} foré, e° € 0, 0 #80’ 


among all unbiased families of confidence sets S(X) at level 1 — a, then S*(X) is a 
UMAU family of confidence sets at level 1 — a. 


Remark I. Definition 1 says that a family S(X) of confidence sets for a parame- 
ter 0 is unbiased at level 1 — @ if the probability of true coverage is at least 1 — a and 
that of false coverage is at most 1 — a. In other words, S(X) traps a true parameter 
value more often than it does a false one. 


Theorem 1. Let A(@o) be the acceptance region of a UMP unbiased size a test 
of Ho(0o): 8 = 09 against H; (69): 0 4 Oo for each 6. Then S(x) = {0: x € A(O)} 
is a UMA unbiased family of confidence sets at level 1 — a. 


Proof. To see that S(x) is unbiased, we note that since A(@) is the acceptance 
region of an unbiased test, 


Po {S(X) contains 6’} = Pe(K € A(6’)} < l—a. 


We next show that SCX) is UMA. Let S*(x) be any other unbiased (1 — a)-level 
family of confidence sets, and write A*(@) = {x: S*(x) contains 6}. Then Pp{X € 
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A*(6’)} = Po{S*(X) contains 0’} < 1—«a, and it follows that A*(@) is the acceptance 
region of an unbiased size a test. Hence 


Po {S*(X) contains 6’} = Pg{X € A*(6’)} 
> Po{X € A(O’)} 
= P9{S(X) contains 6’}. 


The inequality follows since A(@) is the acceptance region of a UMP unbiased test. 
This completes the proof. 


Example 1. Let X, X2,..., Xn be a sample from N (pu, o*) where both yz and 
o” are unknown. For testing Ho: = to against Hi: x # po, it is known (Fergu- 
son [25, p. 232]) that the t-test 


L/n@ — 10)!| 
, ——————— >c 
g(x) = s 
0, otherwise, 


, 


where ¥ = )-x;/n and s* = (n — 1)~! )°(x; — ¥)* is UMP unbiased. We choose c 
from the size requirement 


| Jn (X — 0) 
@ = Pu=py }|———. | > €f > 
AY 
so that c = t~1,9/2. Thus 
n(x — Lo) 
A(uo) = {x: vee Ho) < train} 


is the acceptance region of a UMP unbiased size a test of Ho: uw = fo against 
Hy:  # uo. By Theorem 1 it follows that 


S(x) = {u: x € A(u)} 
= os _ Ss 
= {x = woe Se<sx+ =an-tan} 
is a UMA unbiased family of confidence sets at level 1 ~ a. 


If the measure of precision of a confidence interval is its expected length, one is 
naturally led to a consideration of unbiased confidence intervals. Pratt [79] has shown 
that the expected length of a confidence interval is the average of false coverage 
probabilities. 


Theorem 2. Let © be an interval on the real line and fg be the PDF of X. Let 
S(X) be a family of (1 — a)-level confidence intervals of finite length; that is, let 
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S(X) = (6(X), 9(X)), and suppose that 6(X) — 6(X) is (random) finite. Then 
(3) / (9(x) — 0(x)) fo (x) dx = / Po {S(X) contains 6’} dé’ 
0'#0 
forall@ € 0. 


Proof. We have 


Thus for all 6 € ©, 


=f nee ( i iw) is 
-fif fotoes| do’ 


= / P9{S(X) contains 6’} dé’ 
= [ P9{S(X) contains 6’} dé’. 
040 


Remark 2. If SCX) is a family of UMAU (1 — «)-level confidence intervals, 
the expected length of S(X) is minimal. This follows since the left-hand side of (3) 
is the expected length, if @ is the true value, of S(X) and Pp{S(X) contains @’} is 
minimal [because S(K) is UMAU], by Theorem 1, with respect to all families of 
1 — @ unbiased confidence intervals uniformly in 0(60 # 0’). 


Since a reasonably complete discussion of UMP unbiased tests (see Section 9.5) 
is beyond the scope of this book, the following procedure for determining unbi- 
ased confidence intervals is sometimes quite useful (see Guenther [35]). Let X1, X2, 
...,X, be a sample from an absolutely continuous DF with PDF f(x), and sup- 
pose that we seek an unbiased confidence interval for 9. Following the discussion in 
Section 11.4, suppose that 


T(X1, X2,..., Xn, 0) = T(X, 6) = To 
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is a pivot, and suppose that the statement 

P{d1(a) < Tg < Ar(a)} =1l—a@ 
can be converted to 

Pp{O(X) < 8 < O(X)} =1—a. 


For (8, 8) to be unbiased, we must have 


(4) P(0,0’) = Po{O(X) < 0 < O(X)}=1-a_ ifo’=¢9 
and 
(5) P(0,8)<1-a  ifo’ £8. 


If P(@, 6’) depends only on a function y of 6, 6’, we may write 


2fae- sere 
6 P 
o ON aa pale: 


and it follows that P(y) has a maximum at 6’ = @. 


Example 2. Let X, Xz, ... , Xn be iid N (2, 07) RVs, and suppose that we de- 
sire an unbiased confidence interval for ¢7. Then 


_ @=))s? | 


T (X, 07) = qe 


has a x2(n — 1) distribution, and we have 


S2 
P Ay <(n—-I-> <do} =1-4a, 
oO 


so that 
Ss? Ss? 
Plane <tc ve| = 1-a. 
Then 
Ss? Ss? 
P(o?, 0”) = P,2 {« “fj .20° 26+ ve | 
A2 Xr 


Tg 


T, 
=Pi}—<y<—}, 
2 AY 
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wtp 2 2 
where y = o'*/o~ and T, ~ x*(n — 1). Thus 


Ply) = Pidiy < Ts <A2y}- 


P(l)=1-—a@ and P(y)<1-a. 
Thus we need 41, Az such that 
(7) Pd)=1-a 


and 


dP(y) 


(8) rT 


Je ho fn—1(A2) — Ar fn—1A1) = 0, 


where f;,_1 is the PDF of J, . Equations (7) and (8) have been solved numerically for 
Ay, A2 by several authors (see, for example, Tate and Klett [111]). Having obtained 
Ay, Az from (7) and (8), we have as the unbiased (1 — a)-level confidence interval 


S2 Ss? 
(9) (« — Ds (n — n=) ; 


Note that in this case the shortest-length confidence interval (based on 7, ) derived 
in Example 11.4.3, the usual equal-tails confidence interval, and (9) are all different. 
The length of the confidence interval (9), however, can be considerably greater than 
that of the shortest interval of Example 11.4.3. For large n all three sets of intervals 
are approximately the same. 


Finally, let us briefly investigate how invariance considerations apply to confi- 
dence estimation. Let X = (X1, X2,...,Xn) ~ fo,0 € OC R. Let G be a group 
of transformations on X that leaves P = {fg: @ € Q} invariant. Let S(X) be a 
(1 — a@)-level confidence set for @. 


Definition 2. Let P be invariant under G, and let S(x) be a confidence set for 0. 
Then S is equivariant under G if for every x € X, 0 € O, and g € G, 


(10) S(x) € 0 © S(g(x)) 3 26. 
Example 3. Let X;, X2,... , X, be a sample from PDF 
So(x) = exp[—( — 8)], x>é 


and = Oifx < 6. LetG = {{a,1}: a € R}, where {a, I}x = (x + a,x2 + 
a,...,Xn +a) and G induces G = G on © = R. The family { fo} remains invariant 
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under G. Consider a confidence interval of the form 
S(x) = (0: x —cy <0 <x +c} 
where c), c2 are constants. Then 
S({a, jx) = (0: X¥ +a—c) <0 <X% +a — C3}. 
Clearly, 


Sx) 306 = x+a-—c) <O0+a<X+a-cC2 


<> S({a, 1}x) 3 26 
and it follows that S(x) is an equivariant confidence interval. 


The most useful method of constructing invariant confidence intervals is test in- 
version. Inverting the acceptance region of invariant tests often leads to equivariant 
confidence intervals under certain conditions. Recall that a group G of transforma- 
tions leaves a hypothesis-testing problem invariant if G leaves both Og and ©, in- 
variant. For each Ho : 8 = 09, 69 € © we have a different group of transformations, 
Go, which leaves the problem of testing 6 = 6p invariant. The equivariant confidence 
interval, on the other hand, must be equivariant with respect to G, which is a much 
larger group since G D Ge, for all 09. The relationship between an equivariant confi- 
dence set and invariant tests is more complicated when the family P has a nuisance 
parameter T. 

Under certain conditions there is a relationship between equivariant confidence 
sets and associated invariant tests. Rather than pursue this relationship, we refer the 
reader to Ferguson (27, p. 262]; it is generally easy to check that (10) holds for 
a given confidence interval S to show that S is invariant. The following example 
illustrates this point. 


Example 4. Let Xi, X2,... , Xn be iid N (1, 0) RVs where both yz and o? are 
unknown. In Example 9.5.3 we showed that the test 


i <if S4Gp =x)? =0p xe a 
0 otherwise 


$(x) = | 


is UMP invariant, under translation group for testing Ho : o* > og against Hy] : 


o< oe: Then the acceptance region of @ is 


n 
A(x) = ( : eG — ¥)° > 09 X4-1,1-0 
1 
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Clearly, 
2 
2. m—))s 
x € A(x) <> o9 < a; Fs 
n—l,l—a@ 
and it follows that 
— 1)s2 
S(x) = { at < aa 
Xn-1,1—« 


is a (1 —a@)-level confidence interval (upper confidence bound) for oa”. We show that 
S is invariant with respect to the scale group. In fact, 


2(n — 1)52 
S({0, c}x) = [~ io? < ma 
Xn—1,1—o 


and 


62 
a? < COU 5 5110, c)x) > go? = (0, c]o? 


Xn-1,1-a 


and it follows that S(x) is an equivariant confidence interval for 07. 


PROBLEMS 11.5 


1. Let X1, X2,..., X, be a sample from U(0, @). Show that the unbiased confi- 
dence intervals for 9 based on the pivot max X;/@, coincides with the shortest- 
length confidence interval based on the same pivot. 


2. Let X 1, X2,...,X, be a sample from G(i, 0). Find the unbiased confidence 
interval for 9 based on the pivot 2 )7?_, Xi/6. 


3. Let X1, X2,..., Xp, be a sample from the PDF 


—(x-0) 
fo(x) = ; 


ifx>060 
otherwise. 


Find the unbiased confidence interval based on the pivot 2n[min X; — 6]. 


4. Let X1, X2,... , X, be iid (2, 02) RVs where both yz and o? are unknown. 
Using the pivot T,,¢ = ./n(X — «)/S, show that the shortest-length unbi- 
ased (1 — a)-level confidence interval for yz is the equal-tails interval (X — 
tn—1,a/25//n, X+ tn—1,0/25//N). 


560 CONFIDENCE ESTIMATION 


5. Let X1, X2,..., Xp be iid with PDF fo(x) = 0/x”, x > 0, and = 0 otherwise. 
Find the shortest length (1 — @)-level unbiased confidence interval for 6 based 
on the pivot 6/X (1). 


6. Let X1, X2,... , Xn be a random sample from a location family P = { fo(x) = 
f( —9);0 € FR}. Show that a confidence interval of the form S(x) = {@ : 
T(x) —c) < 6 < T(x) + c2}, where T(x) is an equivariant estimate under 
location group is an equivariant confidence interval. 


7. Let X,, X2,... , Xn be iid RVs with common scale PDF f, (x) = (1/0) f(x/a), 
o > 0. Consider the scale group G = {{0, b} : b > O}. If T(x) is an equivariant 
estimate of a, show that a confidence interval of the form 


T(x) 
sw) = {ose <2 <a 


is equivariant. 


8. Let X1, X2,... , Xp be iid RVs with PDF fo(x) = exp[—(x — 9)], x > 6 and 
= 0, otherwise. For testing Hp : 6 = 0 against H, : 6 > 6, consider the 
(UMP) test 


1 f 2 Ing 
1X SS SSy 
(x) = ag n 


0, otherwise. 


Is the acceptance region of this a-level test an equivariant (1 — a)-level confi- 
dence interval (lower bound) for 6 with respect to the location group? 


CHAPTER 12 


General Linear Hypothesis 


12.1 INTRODUCTION 


This chapter deals with the general linear hypothesis. In a wide variety of problems 
the experimenter is interested in making inferences about a vector parameter. For 
example, he may wish to estimate the mean of a multivariate normal or to test some 
hypotheses concerning the mean vector. The problem of estimation can be solved, for 
example, by resorting to the method of maximum likelihood estimation, discussed 
in Section 8.7. In this chapter we restrict ourselves to linear model problems and 
concern ourselves mainly with problems of hypothesis testing. 

In Section 12.2 we formally describe the general model and derive a test in com- 
plete generality. In the next four sections we demonstrate the power of this test by 
solving four important testing problems. We need a considerable amount of linear 
algebra in Section 12.2. 


12.2. GENERAL LINEAR HYPOTHESIS 


A wide variety of problems of hypothesis testing can be treated under a general 
setup. In this section we state the general problem, and derive the test statistic and its 
distribution. Consider the following examples. 


Example 1. Let Y;, ¥2,...,Y,; be independent RVs with EY; = wj,i = 
1,2,...,k, and common variance o?. Also, n; observations are taken on Y;,i = 
1,2,...,k, and ier nj = n. It is required to test Ho: wij = 2 = --- = pg. The 
case k = 2 has already been treated in Section 10.4. Problems of this nature arise 
quite naturally, for example, in agricultural experiments where one is interested in 
comparing the average yield when k fertilizers are available. 


Example 2. An experimenter observes the velocity of a particle moving along a 
line. He takes observations at given times f), f2,... , f2. Let 81 be the initial velocity 
of the particle and £2 be the acceleration; then the velocity at time ¢ is given by y = 
Bi + Bot +, where e is an RV that is nonobservable (e.g., an error in measurement). 
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In practice, the experimenter does not know f; and f2 and has to use the random 
observations Y;, Y2,... , ¥, made at times , t2,... , t,, respectively, to obtain some 
information about the unknown parameters f;, Bo. 

A similar example is the case when the relation between y and t is governed by 


y = Bo + Bit + Bot? +e, 


where ¢ is a mathematical variable, Bp, 61, 82 are unknown parameters, and ¢ is a 
nonobservable RV. The experimenter takes observations Y;, Y2,... , Y, at predeter- 
mined values 11, t2,... , fn, respectively, and is interested in testing the hypothesis 
that the relation is in fact linear, that is, Bz = 0. 


Examples of the type discussed above and their much more complicated variants 
can all be treated under a general setup. To fix ideas, let us first make the following 
definition. 


Definition 1. Let Y = (¥;, Y2,... , Y,)’ be arandom column vector and X be an 
n x k matrix, k <n, of known constants x;;,i = 1,2,...,n; fj = 1,2,...,k. We 
say that the distribution of Y satisfies a linear model if 


(1) EY = Xf, 


where B = (fj, 62, ... , By)’ is a vector of unknown (scalar) parameters 61, Bo, 
..- » Bx. It is convenient to write 


(2) Y=XB+e, 
where € = (€1,€2,...,&n)’ is a vector of nonobservable RVs with Ee; = 0, 
j = 1,2,...,n. Relation (2) is known as a linear model. Then the general linear 


hypothesis concerns B, namely, that B satisfies Ho: HB = 0, where H is a known 
r x k matrix with r < k. 


In what follows we assume that €), €2,... , €, are independent, normal RVs with 
common variance o? and Ee; = 0, j = 1,2,... ,n. In view of (2), it follows that 
Y,, Yo,... , ¥, are independent normal RVs with 


k 
(3) EY; =) xijBj and var(¥;)=07, i =1,2,...,n. 
j=l 


We assume that H is a matrix of full rank r,r < k, and X is a matrix of full rank 
k <n. Some remarks are in order. 


Remark I. Clearly, Y satisfies a linear model if the vector of means EY = 
(EY, EY2,..., EY,)' lies in a k-dimensional subspace generated by the linearly 
independent column vectors xj, X2,... , Xx of the matrix X. Indeed, (1) states that 
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EY is a linear combination of the known vectors x),... , x. The general linear 
hypothesis Hp: HB = 0 states that the parameters 6;, 62, ... , Bx satisfy r indepen- 
dent homogeneous linear restrictions. It follows that under Ho, EY lies in a (k — r)- 
dimensional subspace of the k-space generated by x1, ... , Xx. 


Remark 2. The assumption of normality, which is conventional, is made to com- 
pute the likelihood ratio test statistic of Ho and its distribution. If the problem is to 
estimate B, no such assumption is needed. One can use the principle of least squares 
and estimate B by minimizing the sum of squares, 


(4) >> 7 = ee’ = (Y — XB)'(¥ — Xf). 
i=] 


The minimizing value BY) is known as a least squares estimate of B. This is not a 
difficult problem and we do not discuss it here in any detail but mention only that 
any solution of the normal equations 


(5) X’XB =X'Y 


is a least squares estimator. If the rank of X is k(< n), then X’X, which has the same 
rank as X, is a nonsingular matrix that can be inverted to give a unique least squares 
estimator 


(6) B = (XX) 'X’Y. 


If the rank of X is < k, then X’X is singular and the normal equations do not have 
a unique solution. One can show, for example, that B is unbiased for B, and if the 
Y;’s are uncorrelated with common variance o”, the variance-covariance matrix of 
the A;’s is given by 


1) E {(@ ~B) (B- p) } = 07(X'X) |. 


Remark 3. One can similarly compute the restricted least squares estimator of 
B by the usual method of Lagrange multipliers. For example, under Ho: HB = 0, 
one simply minimizes (Y — XB)’(Y — XB) subject to HB = 0 to get the restricted 
least squares estimator 8. The important point is that if € is assumed to be a multi- 
variate normal RV with mean vector 0 and dispersion matrix o7I,,, the MLE of B is 
the same as the least squares estimator. In fact, one can show that B; is the UMVUE 
of £;,i = 1,2,... ,k, by the usual methods. 


Example 3. Suppose that a random variable Y is linearly related to a mathemat- 
ical variable x that is not random (see Example 2). Let ¥;, Y2,..., Y, be obser- 
vations made at different known values xj, x2,...,X, of x. For example, x), x2, 
..+ > Xn May represent different levels of fertilizer, and Y,, Y2,... , Y,, respectively, 
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the corresponding yields of a crop. Also, €1, €2,... , €, represent unobservable RVs 
that may be errors of measurements. Then 


Y; = Bo + Bixi + &, i=1,2,...,n, 


and we wish to test whether 8) = 0, that the fertilizer levels do not affect the yield. 
Here 


B = (Bo, Bi)’, and € = (€1,62,...,€n)’. 


The hypothesis to be tested is Ho: B; = 0, so that with H = (0, 1), the null hypoth- 
esis can be written as Hp: HB = 0. This is a problem of linear regression. 
Similarly, we may assume that the regression of Y on x is quadratic: 


Y = Bo + Bix + fox? +6, 


and we may wish to test that a linear function will be sufficient to describe the rela- 
tionship, that is, 82 = 0. Here X is the n x 3 matrix 


x} x 
X2 x3 

».4 = F 3 : 
1 Xp an 


B = (Bo, Bi, 2)’, and © = (€1,€2,...,£n)', 


and H is the 1 x 3 matrix (0, 0, 1). 
In another example of regression, the Y’s can be written as 


Y = Bix1 + Box2 + 3x3 +6, 
and we wish to test the hypothesis that 8; = 62 = 3. In this case, X is the matrix 


X11 -X12——-X43 
X2i X22 X23 
xX=|. ; . 


, 
Xni Xn2 Xn3 


and H may be chosen to be the 2 x 3 matrix 


1 0 -il 
H=(; -1 ale 


GENERAL LINEAR HYPOTHESIS 565 


Example 4. Another important example of the general linear hypothesis involves 
the analysis of variance. We have already derived tests of hypotheses regarding the 
equality of the means of two normal populations when the variances are equal. In 
practice, one is frequently interested in the equality of several mene when the vari- 
ances are the same, that is, one has k samples from N(11, 0 »; es N (tg, 67), 
where o2 is unknown and one wants to test Ho: #1 = M2 = --: = Uy (see Ex- 
ample 1). Such a situation is of common occurrence in sacicultnital experiments. 
Suppose that k treatments are applied to experimental units (plots), the ith treatment 
is applied to n; randomly chosen units, i = 1,2,... ,k, ea 2; = n, and the obser- 
vation yj; represents some numerical characteristic (yield) of the jth experimental 
unit under the ith treatment. Suppose also that 


Yij = wi + €ij, PHM 2 ct gn PH 1, 23 ee ky 


where ej; are iid N(0, o*) RVs. We are interested in testing Ho: ay = Wg =-++ = 
pty. We write 


Y=("%1, Yj2,... » Vin, Y21, Yo2,... > Yon,,--- Vans Vins --- Ven). 
B = (41, Ha2,--. » Bk)’ 


and 
I, O --- 9O 
0 tL, 0 
X= : . 
0 0 --: In 
where 1,, = (1,1,..., 1)’ is the nj-vector @ = 1,2,...,k), each of whose ele- 


ments is unity. Thus X ism x k. We can choose 


i 22. 62 oe: 0 
1 0 -!1 vee 0 
H= 
1 0 0 cs | 
so that Ho: (4) = W2 = --- = py is of the form HB = 0. Here Hisa(k — 1) xk 


matrix. 

The model described in this example is frequently referred to as a one-way anal- 
ysis of variance model. This is a very simple example of an analysis of variance 
model. Note that the matrix X is of a very special type; namely, the elements of X 
are either 0 or 1. X is known as a design matrix. 


Returning to our general model 


Y=XB+e, 
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we wish to test the null hypothesis Hp: HB = 0. We will compute the generalized 
likelihood ratio test and the distribution of the test statistic. To do so, we assume that 
€ has a multivariate normal distribution with mean vector 0 and variance—covariance 
matrix o7In, where o2 is unknown and I,, is then x n identity matrix. This means 
that Y has an n-variate normal distribution with mean XB and dispersion matrix 071, 
for some B and some o?, both unknown. Here the parameter space @ is the set of 
(k + 1)-tuples (B’, 0?) = (Bi, B2,... , Bx, 77), and the joint PDF of the X’s is given 
by 


1 text ; 
FB.o02(V1s Ys +++ + In) = Qny*7ign OP =a do — Bixit — +++ — BeXik) | 


{ 1 ; 
(8) = GnyPor exp |- ssa — XB) (¥ — xp)| : 


Theorem 1. Consider the linear model 
Y=XB+e, 


where X is ann x k matrix, (xjj),i = 1,2,....", jf = 1,2,...,k, of known 
constants and full rank k < n, B is a vector of unknown parameters 6;, B2,... , Bx, 
and € = (€1, €2,... , €n) is a vector of nonobservable independent normal RVs with 
common variance o% and mean Ee = 0. The GLR test for testing the linear hypoth- 
esis Ho: HB = 0, where H is an rr x k matrix of full rank r < k, is to reject Ho at 
level a if F > Fy, where Py,{F > Fy} = and F is the RV given by 


(= Xpy’ (¥ ~ Xp) - W- XB) (¥ - XB) 


(9) F= 
(Y — Xpy'(Y — XB) 


In (9), B, and B are the MLEs of B under © and Op, respectively. Moreover, the RV 
[(n —k)/r]F has the F-distribution with (r,n — k) df. under Ho. 


Proof. The GLR test of Hp: HB = 0 is to reject Ho if and only if A(y) < c, 
where 


SUPece,y Sp,o2(Y) 


10 My) = 5 
ae ” supgce Sp,o2(Y) 


@ = (B',07Y, and @o = {(B’, o2)': HB = 0}. Let 6 = (B’, 62)’ be the MLE of 


6’ € O, and 6 = = B, 6 *y be the MLE of @ under Ap, that is, when HB = 0. It is 
easily seen that Bi is the value of B that minimizes (y — XB)'(y — XP), and 


(11) 6? =n "(y — XB)'(y — XB). 


Similarly, B is the value of B that minimizes (y — XB)'(y — XP) subject to HB = 0, 
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and 
a2 a a x 
(12) Go =n'(y — XB)'(y — XB). 
It follows that 
2 

g2\"' 
(13) AY) =| = ’ 

a 


The critical region A(y) < c is equivalent to the region {ACy)}72/ " < {c}-?/", which 
is of the form 


fo 

(14) rs) > Ci. 
fod 

This may be written as 


(y—XBy'y- XB) _ 


(15) - é 
(y — XB)'(y ~ XB) 


or, equivalently, as 


(y — XB)(y — XB) ~ y — XBY(y— XB) | 
(y — XB)'(y — XB) 
It remains to determine the distribution of the test statistic. For this purpose it 


is convenient to reduce the problem to the canonical form. Let V, be the vector 
space of the observation vector Y, V; be the subspace of V, generated by the col- 


(16) —1. 


umn vectors x1, X2,...,Xx of KX, and V,_, be the subspace of Vg; in which EY 
is postulated to lie under Hp. We change variables from ¥;, ¥2,... , Yn, to Z1, Z2, 
...,2Zn, Where Z;, Z2,... , Z, are independent normal RVs with common variance 
o? and means EZ; = 6;,i = 1,2,...,k, EZ; = 0,i = k+1,...,n. This 


is done as follows. Let us choose an orthonormal basis of k — r column vectors 
{aj} for Vi_,, say {@+41, O+42,..., a}. We extend this to an orthonormal basis 


{Qe}, @2,... , &,, Oy41,... , @x} for V;, and then extend once again to an orthonor- 
mal basis {@1, @2,... , Qk, @+1,--. , @,} for V,. This is always possible. 
Let z1, Z2,..- , Zn be the coordinates of y relative to the basis {a), a2, ... , @}. 


Then z; = ay and z = PY, where P is an orthogonal matrix with ith row a}. Thus 
EZ; = EoalY = aj/XB, and EZ = PX. Since XB € Vy (Remark 1), it follows 
that a; XB = 0 fori > k. Similarly, under Hp, XB € Vy, C Vy, so that a/XB = 0 


fori < r. Let us write w = PXB. Then ay4; = wei2 = ++: = @, = O, and 
under Hp, @| = @2 = --- = w, = 0. Finally, from Corollary 2 of Theorem 5.4.6 
it follows that Z1, Z2,... , Z, are independent normal RVs with the same variance 


o* and EZ; = @;,i = 1,2,... ,n. We have thus transformed the problem to the 
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following simpler canonical form: 


Q: Z; are independent N(w;,07), i=1,2,...,n, 
(17) Ok+1 = Wk42 = ++: = Oy, =O, 

Ho: 0, =02=:::=a, = 0. 
Now 
(18) (y — XB)'(y — XB) = (P'z — P’w)' (P’z — P'w) 


= (z— w)'(z— w) 
k n 
= Kee — a)? + > rai 
i=1 isk+1 


The quantity (y — XB)’(y — XB) is minimized if we choose @ = z,i = 
1,2,... ,k, so that 


n 


(19) (y—XB)'(y-XB)= )~ 7. 
ixk+1 
Under Hp,@,; = @2 = --: = a, = 0, s0 that (y — XB)/(y — XB) will be 

minimized if we choose @; = zj,i =r+1,...,k. Thus 

3 s r us 
(20) (y — XB)'(y - XB) = 2? + D- 7. 
It follows that 

i ii z? ; 


Now bee | Z? /o? has a x2(n — k) distribution, and under Ho, i Z? Jo? has 
a x(r) distribution. Since )“7_, Z? and )~?_,,, Z? are independent, we see that 
[(n —k)/r]F is distributed as F(r, n ~ k) under Hp, as asserted. This completes the 
proof of the theorem. 


Remark 4. In practice, one does not need to find a transformation that reduces 
the problem to the canonical form. As will be done in the following sections, one 


simply computes the estimators 6 and @ and then computes the test statistic in any 
of the equivalent forms (14), (15), or (16) to apply the F-test. 


Remark 5. The computation of B, B is greatly facilitated, in view of Remark 3, 
by using the principle of least squares. Indeed, this was done in the proof of Theo- 
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rem | when we reduced the problem of maximum likelihood estimation to that of 
minimization of sum of squares (y — XB)’(y — XB). 


Remark 6. The distribution of the test statistic under H is easily determined. We 
note that Z;/o ~ N(w;/o, 1) fori = 1,2,... ,r,sothat )y_, Z?/o? has a noncen- 
tral chi-square distribution with r d.f. and noncentrality parameter 5 = )°)_, wo? /o?. 
It follows that [(n — k)/r]F has a noncentral F-distribution with d.f. (r,n — k) 
and noncentrality parameter 5. Under Ho, 4 = 0, so that [(n — k)/r]F has a cen- 
tral F(r,n — k) distribution. Since )77_, w? = Y7}_,(EZ;)’, it follows from (19) 
and (20) that if we replace each observation Y; by its expected value in the numerator 
of (16), we get 05. 


Remark 7. The general linear hypothesis makes use of the assumption of com- 


mon variance. For instance, in Example 4, Yj; ~ N (ui, a2), DZ AL 2 cok. 
Let us suppose that ¥;; ~ N (ui, o?), i = 1,2,... ,k. Then we need to test that 
oO, = 02 = --- = ox before we can apply Theorem 1. The case k = 2 has already 


been considered in Section 10.3. For the case where & > 2 one can show that a UMP 
unbiased test does not exist. A Jarge-sample approximation is described by Lehmann 
{62, pp. 376-377]. It is beyond the scope of this book to consider the effects of depar- 
tures from the underlying assumptions. We refer the reader to Scheffé [99, Chap. 10], 
for a discussion of this topic. 


PROBLEMS 12.2 


1. Show that any solution of the normal equations (5) minimizes the sum of squares 
(¥Y — XB)'(¥ — XB). 


2. Show that the least squares estimator given in (6) is an unbiased estimator of B. 
If the RVs Y; are uncorrelated with common variance 07, show that the covari- 
ance matrix of the B;’s is given by (7). 


3. Under the assumption that € [in model (2)] has a multivariate normal distribution 
with mean 0 and dispersion matrix o7/,,, show that the least squares estimators 
and the MLEs of B coincide. 


4. Prove statements (11) and (12). 


5. Determine the expression for the least squares estimator of B subject to HB = 0. 


12.3 REGRESSION MODEL 


In this section we consider a simple linear regression model as a special case of 
the general linear hypothesis and show how some inferential questions about the 
parameters of the regression equation can be answered. Let x}, x2, ... , Xn ben given 
numbers, and suppose that 
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(i) Y; = Bo + Bixi + &, i=1,2,...,n, 


where fo, 6; are unknown parameters and ¢; are independent normal RVs with 
Ee; = 0 and var(e;) = 02, i = 1,2,... ,n. Also, a2 is assumed to be unknown. 
Our object is to test hypotheses concerning fo and f; and to construct confidence 
intervals for Bp and £,. Rewriting (1) in the usual fashion, we have 


(2) Y=XB+e, 
where 
1 x 
, 1 x2 
B = (Bo, 1) and X= : 
1 Xn 


Clearly, ¥1, ¥2,..., ¥, are independent normal RVs with EY; = fo + 61x; and 
var(Y¥;) = o?,i = 1,2,...,n, and Y is an n-variate normal random vector with 
mean XB and variance o7I,,. The joint PDF of Y is given by 


1 1 | ee 
(3) f(y; Bo. Bi, 07) = On on exp Ee 20% — Bo - avs? ; 


It easily follows that the MLEs for fo, 61, and o” are given by 


, in a oe 
(4) Bo = duct — Bix, 
5 don (i — ¥) Ki —Y) 
and 
aay ee se =) 
(6) 6? = — S(¥; — Bo — Axi)’, 
nial 


iz 
where x = n=! S7?_, xj. 


If we wish to test Ho: 8, = 0, we take H = (0, 1), so that the model is a special 
case of the general linear hypothesis with k = 2, r = 1. Under Ho the MLEs are 


(7) prare 


and 
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a2 1< = 
(8) ¢ =- da -—Y)?. 


_ Chai —¥)? = DR — ¥ + Be ~ Bix? 
”_(%i ~Y¥ + Bix — Byx;)? 
B? De Gi —x) 
m(% —¥ + Bix — Bix)? 


From Theorem 12.2.1, the statistic [((n — 2)/1]F has a central F(1, n — 2) distri- 
bution under Ho. Since F (1, n — 2) is the square of a t(n — 2), the likelihood ratio 
test rejects Ho if 


1/2 
4 (n — 2) Di — X)* 
(10) \Ay| ode ot > c0, 
Dia i — ¥ + Bix — Brxi) 
where co is determined from t-tables for n — 2 df. 


For testing Ho: Bo = 0, we choose H = (1, 0) so that the model is again a special 
case of the general linear hypothesis. In this case 


A ei Yi 
1 a a 2 
i=1 Xj 
and 
22 1 a 
(11) a= — — Bi xi)" 


It follows that 


pee See i — Bim? — Die —Y + Ax — Aims? 


(12) 1 aint (li = ¥ | 
Dee (Yi — ¥ + Bix — B1xi)? 
and since 
(13) jy a eh _ Cha Gi — DU — ¥) + nk 
Di=1 xP i +7 
= — 
i=] x; 
4 nBox 
=A+aa> 
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we can write the numerator of F as 


(4) $< ~ Bix? — 2) — + Aix - Bix)? 
i=] i=l 


a 2 
n i 2 ie! a x nBoxXx; 
=)0(¥- Ait Ax-F¥+¥-Ax- aad 
i=l] i=l * 

uJ _— aA aA 
— 0% = ¥ + Bix - Bix)? 


i=l 


‘ be =n . 
— -z_  nBoxxi R a = 
= (P-ax- 2) +2) (Y; — Bix; + Bix — Y) 
i=t 


i=l i=1*i 


. (7 _ Bix — spe) 


i=1 4; 


_ Ban iG — 3)? 
rma 3 easy maeaiis 
pata Xj 


It follows from Theorem 12.2.1 that the statistic 


Boy n pet i — HP? / Vi x} 
7% — ¥ + Bix — Bixi)2/(m — 2) 


(15) 


has a central t-distribution with n — 2 d.f. under Ho: Bo = 0. The rejection region is 
therefore given by 


|Bol * 0 -X)7/ > x? 
ae Wolyn DierGi =O Rie 


YN — Bo — Bixi)2/(n — 2) 


where co is determined from the tables of t(n — 2) distribution for a given level of 
significance a. 


For testing Ho: Bo = Bi = 0, we choose H = ) so that the model is again 


1 
0 1 
a special case of the general linear hypothesis with r = 2. In this case 


(17) é 


and 
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ye YP — OL — ¥ + Bie — Bri)? 
”_1(¥; — ¥ + Bx — Bx;)* 
nY + B21 — 3 
ye Yi — Bo — B1xi)? 
n(Bo + Bi)? + BP oP i — 3)” 
(i — Bo — Bix)? 
From Theorem 12.2.1, the statistic [(”—2)/2]F has a central F(2, n—2) distribution 


under Ho: Bo = f = 0. It follows that the level-a rejection region for Ho is given 
by 


(18) F= 


n—2 


(19) F > 0, 
where F is given by (18) and co is the upper @ percent point under the F(2, 1 — 2) 
distribution. 


Remark I. It is quite easy to modify the analysis above to obtain tests of null 


hypotheses By = £5, 61 = B;, and (Bo, 1)’ = (Bo, B;)’, where Bp, 8; are given real 
numbers (Problem 4). 


Remark 2. The confidence intervals for Bo, 8; are also easily obtained. One can 
show that a (1 — a)-level confidence interval for Bp is given by 


Dies AP Die Hi — Bo — Biss)” 
n(n — 2) 7", (44 — X)? 


, 


(20) (i — tn—2,0/2 


Der at Dien Mi ~ Bo = Bias)? 
n(n — 2) 7 i — X)? ’ 


and that for 6; is given by 


(n — 2) wae — x)’ 


(21) Bi — tn—2,0/2 


B1 + th2,0 
ee ae Sa ae 


Similarly, one can obtain confidence sets for (Bo, 61)’ from the likelihood ratio test 
of (Bo, 61)’ = (Bg, By)’. It can be shown that the collection of sets of points (Bo, 81)’ 
satisfying 
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es (n — 2)[n(Bo ~ Bo)? + 2n®(Bo — Bo)(Bi — Br) + 7%, x2(B1 — B1)7) 
2 ie Ki — Bo a By xi)? 
< Fon-2,0 


is a qd — a)-level collection of confidence sets (ellipsoids) for (Bo, 61)’ centered at 


(Bo, By)’. 


Remark 3. Sometimes interest lies in constructing a confidence interval on the 
unknown linear regression function E{Y { x9} = Bo + 6x0 for a given value of x, or 
on a value of Y given x = xo. We assume that xo is a value of x distinct from x1, x2, 

. > Xn. Clearly, Bo + Bixo is the maximum likelihood estimator of Bo + Bix. This 
is also the best linear unbiased estimator. Let us write E{y | xo} = Bo + Bi xo. Then 


E{Y | xo} = ¥ — Bix + Bixo 


which is clearly a linear function of normal RVs Y;. It follows that E {Y | xo} is also 
normally distributed with mean E (Bo + £1x0) = Bo + 81X0 and variance 


(23) var(E{¥ | xo}) = E(Bo — Bo + Bixo — B1x0)" 
= var(Ao) + x var(B1) + 2x0 cov(Ao, A1) 


_ afl (¥ — x0)” 


(see Problem 6). It follows that 


(24) Bo + Bixo — Bo — Bix0 
a{(1/n) + [@& — x0)?/ 7, 4 — X71}? 

is N‘(0, 1). But o is not known, so that we cannot use (24) to construct a confidence 
interval for E{Y | xo}. Since n62/o2 is a x2(n — 2) RV and nG?/o? is independent 
of Bo + Bixo (why?), it follows that 


I—> Bo + Bito — Bo — Bixo 
25 n—- Foo 
ae &{1 + nl — x0)?/ Vp i — ¥)7 1/7 
has a t(n — 2) distribution. Thus a (1 — a)-level confidence interval for Bp + Bi x0 is 
given by 
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n_|l, __@~20)" 
n—-2|[n Yj; -x)?] 


1 _@~ x0)" 
n i (xi — x)? 


(26) Bo + Bixo ~ tn-2,0/26 


A A a n 
Bo + Bix0 + tn-2,a/2% | —> 
n—2 


In a similar manner, one can show (Problem 7) that 


> tk 1 eg 
(27) Bo + B1x0 — tn-2,0/26 S k + at 


n viet (xi — x)? 


‘ ig n+1 — xo)” 
Bo + Bixo + tn—2,0/2 5 : E oS) | 


—— ———<— + PRR ee 
n-2| n Diet Ou — x)? 

is a (1 —q@)-level confidence interval for Yo = Bo +61 x0+¢, that is, for the estimated 
value Yo of Y at xo. 


Remark 4. The simple regression model (2) considered above can be general- 
ized in many directions. Thus we may consider EY as a polynomial in x of a degree 
higher than 1, or we may regard EY as a function of several variables. Some of these 
generalizations will be taken up in the problems. 


Remark 5. Let (X1, Y1), (X2, Y2),... , (Xn, Yn) be a sample from a bivariate 
normal population with parameters EX = yi, EY = pz, var(X) = o?, var(Y) = 
oe, and cov(X, Y) = p. In Section 7.7 we computed the PDF of the sample correla- 
tion coefficient R and showed (Remark 7.7.4) that the statistic 


(28) THR 


has a t(n — 2) distribution, provided that o = 0. If we wish to test p = 0, that is, the 
independence of two jointly distributed normal RVs, we can base a test on the statis- 
tic T. Essentially, we are testing that the population covariance is 0, which implies 
that the population regression coefficients are 0. Thus we are testing, in particular, 
that 6; = 0. It is therefore not surprising that (28) is identical with (10). We empha- 
size that we derived (28) for a bivariate normal population, but (10) was derived by 
taking the X’s as fixed and the distribution of Y’s as normal. Note that for a bivariate 
normal population, E{Y | x} = 42+ e(o2/01)(x — 4) is linear, consistent with our 
model (1) or (2). 


Example 1. Let us assume that the following data satisfy a linear regression 
model: 
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Y; = Bo + Bix; + &. 
x 0 1 2 3 4 5 
y 0.475 1.007 0.838 -—0.618 1.378 0.943 


Let us test the null hypothesis that B; = 0. We have 


5 
¥=25, So@ji-x?=175, y=0.671, 
i= 
5 
Sai — Oi — J) = 0.9985, 
ix0 
A: = 9.0571, By = y — BX = 0.5279, 
5 
Loi — Bo — Bix;)” = 2.3571, 
i=0 
and 
(pu, SH DLE =D" _ 9 3106. 
Loi - Bo - Bi xi)? 


Since tn—2,0/2 = t4,0.025 = 2.776 > 0.3106, we accept Hp at level a = 0.05. 
Let us next find a 95 percent confidence interval for E{Y | x = 7}. This is given 
by (26). We have 


ys n i (% ~ xo)? pan 2 +) 
bi — |- = 2.776,/ ———- 
vent | *5[b+ | 7 as) 


= 92,3707, 
Bo + Bixo = 0.5279 + 0.0571 x 7 
= 0.9276, 


so that the 95 percent confidence interval is (— 1.4431, 3.2983). 

(The data were produced from Table ST6, random numbers with w = 0,0 = 1, 
by letting Bo = 1 and 6; = 0 so that E{¥ | x} = Bo + Bix = 1, which surely lies in 
the interval.) 


PROBLEMS 12.3 


1. Prove statements (4), (5), and (6). 
2. Prove statements (7) and (8). 


3. Prove statement (11). 
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4. Obtain tests of null hypotheses By = 85, 61 = 6, and (Bo, 61)’ = (Bp, By)’, 
where 5, 6; are given real numbers. 


5. Obtain the confidence intervals for Bp and B; as given in (20) and (21), respec- 
tively. 


6. Derive the expression for var(E {¥ | xo}) as given in (24). 


7. Show that the interval given in (27) is a (1 — @)-level confidence interval for 
Yo = Bo + Bi x0 + €, the estimated value of Y at xo. 


8. Suppose that the regression of Y on the (mathematical) variable x is a quadratic 
Y; = Bo + Bix; + Box? + &, 


where fo, 61, 62 are unknown parameters, x1, x2, ... , X, are known values of x, 
and &1, €2,... , €, are unobservable RVs that are assumed to be independently 
normally distributed with common mean 0 and common variance o” (see Ex- 
ample 12.2.3). Assume that the coefficient vectors (xk, 4 tier ae); k=0, 1, 2, 
are linearly independent. Write the normal equations for estimating the 6’s and 
derive the generalized likelihood ratio test of B2 = 0. 


9. Suppose that the Y’s can be written as 
Y; = Bixiy + Boxi2 + Baxi3 + &, 


where .x;1, Xj2, x;3 are three mathematical variables, and ¢; are iid N(0, 1) RVs. 
Assuming that the matrix X (see Example 12.2.3) is of full rank, write the normal 
equations and derive the likelihood ratio test of the null hypothesis Ho: 61 = 


Bo = Bs. 


10. The following table gives the weight Y (grams) of a crystal suspended in a satu- 
rated solution against the time suspended T (days). 


Time, T 0 1 2 3 4 5 6 
Weight, Y |04 0.7 11 16 19 23 26 


(a) Find the linear regression line of Y on T. 

(b) Test the hypothesis that Bp = 0 in the linear regression model ¥; = Bo + 
BT; + &:. 

(c) Obtain a 0.95 level confidence interval for Bo. 


12.4 ONE-WAY ANALYSIS OF VARIANCE 


In this section we return to the problem of one-way analysis of variance considered 
in Examples 12.2.1 and 12.2.4. Consider the model 


(1) Yij = Mi + €ij, fH i, 2,223 es ee mee 
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as described in Example 12.2.4. In matrix notation we write 

(2) Y=XBrt+e, 

where 


YS Ope Mae o eV ia Yar, Yo05 2 eons 208 4 Yes Sie oY) 
B= (41, H2,.-- Mk)’ 


In, O-- 0 
ale See Se 
0 0+ In 
and 
/ 
E = (E71, E12, --- 5 Elay» E21, E225 - += y EMmys +++ 5 Ek 1s EkDy «+» 5 Ekny) - 


As in Example 12.2.4, Y is a vector of n-observations (n = Dar, ni), whose com- 
ponents ¥;; are subject to random error &j; ~ N(0, 07), B is a vector of k unknown 
parameters, and X is a design matrix. We wish to find a test of Ho: wy = 2 = 

+ = px against all alternatives. We may write Ho in the form HB = 0, where H is 
a (k — 1) x k matrix of rank (k — 1), which can be chosen to be 


1 -1 0 ee 0 
1 0 -1 on 0 
H= 
1 0 0 see =] 
Let us write 41 = U2 = +--+ Ue = mw under Ap. The joint PDF of Y is given by 
; 1 n/2 
2) FO Has aay-. ste 0) = (55) exp | —>- ou wi), 


and under Ho by 


1 ee - k ny 
It is easy to check that the MLEs are 


nj 
ps 
(5) jig = IY Ly, on ee 2 


nj 
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k i = 
ip 2 Dani La OE Ye) 


(6) =, 
n 
k Ni 
ie i= i= Vij 
(7) ieee 2) 1 =Y, 
and 


(8) 3? = om ie Oi ~ ms 
n 


By Theorem 12.2.1, the likelihood ratio test is to reject Ho if 


k Ny LD 
(9) Dizi po thed A ry i=l Dope (Yij - Y; 5 n—- Ks Fo, 
i=! ja Vis an Yi.) k-1 = 


where Fo is the upper @ percent point in the F(k — 1, — k) distribution. Since 


(10) > x ~¥) = eT —¥;.+¥;.-Yy 


i=l j= i=! j=1 
k 


2575 1h ~¥;? +m. -Y¥y, 


i=l j=l ix1 


we may rewrite (9) as 


ii -YVP/A-D 

Ey —F2/@—b 
It is usual to call the sum of squares in the numerator of (11) the between sum of 
squares (BSS), and the sum of squares in the denominator of (11) the within sum 


of squares (WSS). The results are conveniently displayed in an analysis of variance 
table in the following form: 


da) 


One-Way Analysis of Variance 


Source of Degrees of Mean Sum 
Variation Sum of Squares Freedom of Squares F-Ratio 
k 
= pa BSS/(k — 1) 
Bet BSS = i(¥;.-¥)? k-1 BSS/(k ~ 1 oe 
etween an ( ) /( ) WSS/n 
Within WSS = 3 yy —Y;.) n—k WSS/(n — k) 
i= j=! 
Mean nY 1 


Total TSS = 3 ay n 


i=! j= 
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The third row, “Mean,” has been included to make the total of the second column 
add up to the total sum of squares (TSS), )“*_, pee ; YZ. 


Example 1. The lifetimes (in hours) of samples from three different brands of 
batteries, Y;, Y2, and Y3, were recorded, with the following results: 


Y Y2 Y; 
40 60 60 
30 40 50 
50 55 70 
50 65 65 
30 75 

40 


We wish to test whether the three brands have different average lifetimes. We will as- 
sume that the three samples come from normal populations with common (unknown) 
standard deviation o. 

From the data n; = 5, nz = 4,13 = 6,n = 15, and 
aD Ag: Ir = = 55, y3= = 60, 
5 4 6 
YOu — 1)? = 400, Sor — ¥2)? = 350, S763 ~ 3)? = 850. 


i= i=1 i=l 
Also, the grand mean is 


200 + 220+ 360 780 


= 52. 
15 15 


y = 
Thus 


BSS = 5(40 — 52)? + 4(55 — 52)? + 6(60 — 52)” 
= 1140 


and 


WSS = 400 + 350 + 850 = 1600. 


Analysis of Variance 
Source SS d.f. MSS F-Ratio 
Between 1140 2 570 570/133.33 = 4.28 


Within 1600 812 = 133.33 
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Choosing a = 0.05, we see that Fo = F2,12,0.05 = 3.89. Thus we reject Ho: 1 = 
p42 = [23 at level a = 0.05. 


Example 2. Three sections of the same elementary statistics course were taught 
by three instructors, I, II, and II. The final grades of students were recorded as fol- 
lows: 


I Il It 
95 88 68 
33 78 79 
48 91 91 
76 51 71 
89 85 87 
82 77 68 
60 31 79 
77 62 16 
96 35 
81 


Let us test the hypothesis that the average grades given by the three instructors are 
the same at level a = 0.05. 

From the data nj = 8,12 = 10,3 = 9,n = 27, ¥, = 70, ¥2 = 74, ¥3 = 66, 
Thu — 9)? = 3168, 0}, 02 — Fa)? = 3686, YP, (vai — 3)? = 4898. 
Also, the grand mean is 


60 + 74 4 
560 + 740 + 59 — 18% _ ays. 


We 27 27 


BSS = 8(0.15)* + 10(3.85)? + 9(4.15)” = 303.4075 


and 


WSS = 3168 + 3686 + 4898 = 11,752. 


Analysis of Variance 
Source SS df. MSS F-Ratio 
Between 303.41 2 151.70 151.70/489.67 


Within 11,752.00 24 489.67 


We therefore cannot reject the null hypothesis that the average grades given by 
the three instructors are the same. 
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PROBLEMS 12.4 


1. Prove statements (5), (6), (7), and (8). 
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2. The following are the coded values of the amounts of corn (in bushels per acre) 
obtained from four varieties, using unequal number of plots for the different 


varieties: 


i MS a 


2,1, 3,2 


3,4, 2, 3,4, 2 


6,4, 8 
7,6,7,4 


Test whether there is a significant difference between the yields of the varieties. 


3. A consumer interested in buying a new car has reduced his search to six different 
brands: D, F, G, P, V, T. He would like to buy the brand that gives the highest 
mileage per gallon of regular gasoline. One of his friends advises him that he 
should use some other method of selection, since the average mileages of the six 


brands are the same, and offers the following data in support of her assertion. 


Distance Traveled (Miles) per Gallon of Gasoline 


Brand 


Car D F G 
l 42 38 28 
2 35 33 32 
3 37 28 35 
4 37 37 
5 
6 


Should the consumer accept his friend’s advice? 


25 


24 


4. The following data give the ages of entering freshmen in independent random 


samples from three different universities, A, B, and C. 


A 


17 
19 
20 
21 
18 


B 


16 
16 
19 
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Test the hypothesis that the average ages of entering freshman at these universi- 


ties are the same. 


5. Five cigarette manufacturers claim that their product has low tar content. Inde- 
pendent random samples of cigarettes are taken from each manufacturer and the 
following tar levels (in milligrams) are recorded. 


A 


Soa 


Brand Tar Level (mg) 


4.2, 4.8, 4.6, 4.0, 4.4 
4.9, 4.8, 4.7, 5.0, 4.9, 5.2 
5.4, 5.3, 5.4, 5.2, 5.5 


5.8, 5.6, 5.5, 5.4, 5.6, 5.8 
5.9, 6.2, 6.2, 6.8, 6.4, 6.3 


Can the differences among the sample means be attributed to chance? 


6. The quantity of oxygen dissolved in water is used as a measure of water pollu- 
tion. Samples are taken at four locations in a lake and the quantity of dissolved 
oxygen is recorded as follows (lower reading corresponds to greater pollution): 


Location 


A 


B 
Cc 
D 


Quantity of Dissolved Oxygen (%) 
7.8, 6.4, 8.2, 6.9 

6.7, 6.8, 7.1, 6.9, 7.3 

7.2, 7.4, 6.9, 6.4, 6.5 

6.0, 7.4, 6.5, 6.9, 7.2, 6.8 


Do the data indicate a significant difference in the average amount of dissolved 
oxygen for the four locations? 


12.5 TWO-WAY ANALYSIS OF VARIANCE WITH 
ONE OBSERVATION PER CELL 


In many practical problems one is interested in investigating the effects of two fac- 
tors that influence an outcome. For example, the variety of grain and the type of 
fertilizer used both affect the yield of a plot; or the score on a standard examination 
is influenced by the size of the class and the instructor. 

Let us suppose that two factors affect the outcome of an experiment. Suppose also 
that one observation is available at each of a number of levels of these two factors. 
Let ¥;;G = 1,2,...,4a; j =1,2,..., b) be the observation when the first factor is 


584 GENERAL LINEAR HYPOTHESIS 


at the ith level and the second factor at the jth level. Assume that 
(i) Yjj =w+a; + By + ej, CH, 2.408. ¢aF) ofS 12s 3b; 


where a; is the effect of the ith level of the first factor, 8; is the effect of the jth level 
of the second factor, and ¢;; is the random error, which is assumed to be normally dis- 
tributed with mean 0 and variance o”. We will assume that the ¢;;’s are independent. 
It follows that Y;; are independent normal RVs with means ye + a; + 6; and vari- 
ance o”, There is no loss of generality in assuming that ie HH = ai Bj = 0, 
for if wij = pw’ + a + B;, we can write 


Haj = (ul + +B) +o} - 7) + (Bi --B) 
=pe+a; + Bj 
and } (7, aj = 0, mer B; = 0. Here we have written & and f’ for the means of 
a;’s and f’,’s, respectively. Thus ¥;; may denote the yield from use of the ith variety 
of some grain and the jth type of some fertilizer. The two hypotheses of interest are 
ay =02=-:--=a,=0 and Bi = fp, =-:-= Bp = 0. 


The first of these, for example, says that the first factor has no effect on the outcome 
of the experiment. 


In view of the fact that °¢_,a; = 0 and -9_, Bj = 0,a@ = — Uj ay, 
fo = — Ee §;, and we can write our model in matrix notation as 
(2) Y=XBrte, 
where 


Y = (Yi1, Yio. --- » Yaw, You, Yo, --- » You, --- s Yar, Yor, --- » Yao)’, 
B = (4, a1, 02,... ,%a—1, B1, B2,--- » Bo-1)', 


/ 
€ = (£11, €12,--- » 1b, 215 €225 +++ » EIbr- ++ 1 Eas Eads» +++ »Eab) s 


and 
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The vector of unknown parameters B is (a + b — 1) x 1, and the matrix X ts 
ab x (a + b — 1) (b blocks of a rows each). We leave the reader to check that 
X is of full rank, a + b — 1. The hypothesis Hy: @, = a2 = --- = a@q = Oor 
Hp: Bi = B2 = --- = Bp = Ocan easily be put into the form HB = 0. For example, 
for Hg we can choose H to be the (b — 1) x (a + b — 1) matrix of full rank b — 1, 
given by 


Clearly, the model described above is a special case of the general linear hypothesis, 
and we can use Theorem 12.2.1 to test Hg. 
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To apply Theorem 12.2.1, we need the estimators jz;; and hi; j- Itis easily checked 
that 


ee Ean Jij 
3 pS Se ey 
(3) Bh a y 
and 
(4) & = yj. — Y, Bj =¥.j -Y, 


where y;. = Ye yy /b, 9.5 = Soret yij /a. Also, under Hg, for example, 


A 


(3) f=y and dG; =¥;.—V¥. 
In the notation of Theorem 12.2.1,n = ab,k =a+b-—1,r = b—1, so that 

n—-k=ab—a—b+1=(a—1)(b— 1), and 

Dh je ty — Yi)? -— DR Dy -— Fi. - ¥.7 +)? 


6) F= —<F Me kf Sas 
Sei ag — Kae gt YP 


Since 


a ob a b 
QM > -¥i0? => Vay - ¥i.-¥.) +N +0, -YP 
i=1 j=! i=1 j=l 
a ob M2 a: = b = _ 
=) 30; -Y¥i.-¥.j+¥) +a (Y¥.;-Y)’, 
i=] j=1 j=l 


we may write 


% = ay"_,(¥.j-Y) 
Diet yar —Yem Py + ¥y 
It follows that under Hg, (a — 1) F has a central F(b — 1, (a — 1)(b — 1)) distribution. 
The numerator of F in (8) measures the variability between the means Y.;, and 
the denominator measures the variability that exists once the effects due to the two 
factors have been subtracted. 
If H,, is the null hypothesis to be tested, one can show that under Hy the MLEs 
are 


A 


(9) a=y and Bj =yj—V. 


As before, n = ab,k =a+b— 1, butr =a — 1. Also, 
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1 yi My =¥;, ~Y¥.; +Y)2 


, 


(10) F= 


which may be rewritten as 


an bY. ¥P 
ia Die Ky ~ Yj. - Y.j +Y)2 
It follows that under H,, (b — 1)F has a central F(a — 1, (a — 1)(b — 1)) distri- 
bution. The numerator of F in (11) measures the variability between the means Y;.. 
If the data are put into the following form: 


Level of factor 2 
Bi 1 2 b | Row mean 
a 
1) ¥u, Yaa, » Yip Yi 
Level 2 | Yor, Yaa, » Yo Y2 
of % : 

factor | 

a Yai, Y2, Fa Sg, Yob Ya: 
Column mean | Y.;,  Y.2, -:-, Yu Y 


so that the rows represent various levels of factor 1, and the columns, the levels of 
factor 2, one can write 


a 
between sum of squares for rows = b ym. —Y¥)* 
i=1 


= sum of squares for factor 1 


= SS). 


Similarly, 


b 
between sum of squares for columns = a Yi. j7 yy’ 
j=l 


= sum of squares for factor 2 
= SS. 
It is usual to write error or residual sum of squares (SSE) for the denominator of (8) 


or (11). These results are conveniently presented in an analysis of variance table as 
follows: 
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Two-Way Analysis of Variance Table with One Observation per Cell 


Source of Sum of Degrees of Mean 
Variation Squares Freedom Square F-Ratio 
Rows SS, a-1 MS, = SS,/(a — 1) MS, /MSE 
Columns SS, b-J MS; = SS2/(b — 1) MS,2/MSE 
Error SSE (a-—1)(6-—1) MSE=SSE/(a—- 1)(6— 1) 
Mean aby” 1 aby" 

a eb a ob 

Toal =} OY} ab >> >o Yj /ab 
i=l j= i=1 j=1 


Example 1. The following table gives the yield (pounds per plot) of three vari- 
eties of wheat, obtained with four different kinds of fertilizers. 


Variety of Wheat 
Fertilizer A B Cc 
a 3 7 
B 10 4 8 
¥ 6 5 6 
6 8 4 7 


Let us test the hypothesis of equality in the average yields of the three varieties of 
wheat and the null hypothesis that the four fertilizers are equally effective. 
In our notation, b = 3, a = 4, y). = 6, 2. = 7.33, ¥3. = 5.67, ¥4. = 6.33, 
Ji = 8, ¥-2 = 4, ¥.3 = 7, y = 6.33. 
Also, 
SS, = sum of squares due to fertilizer 
= 3[(.33)? + 17 + (0.66)? + 07) 
= 4.67; 
SS2 = sum of squares due to variety of wheat 
= 4[(1.67)* + (2.33)? + (0.67)7] 
= 34.67 


and 


4 3 
SSE= 00 —9i- -¥-y + 


i=1 j=) 
= 7.33 
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The results are shown in the following table: 


Analysis of Variance 
Source SS df. MS __ F-Ratio 
Variety of wheat 34.67 2 17.33 14.2 
Fertilizer 4.67 3 1.56 1.28 
Exror 7.33 6 1.22 
Mean 481.33 1 481.33 
Total 528.00 12 44.00 


Now F?,6,0.05 = 5.14 and F3,6,0.05 = 4.76. Since 14.2 > 5.14, we reject Hg, that 
there is equality in the average yield of the three varieties; but since 1.28 4 4.76, we 
accept H,, that the four fertilizers are equally effective. 


PROBLEMS 12.5 


1. Show that the matrix X for the model defined in (2) is of full rank, a + b — 1. 
2. Prove statements (3), (4), (5), and (9). 


3. The following data represent the units of production per day turned out by four 
different brands of machines used by four machinists: 


Machinist 
Machine A, A2 A3 Ag 
B, 15 14 19 18 
By 17 12 20 16 
B, 16 18 16 17 
By 16 16 1S 15 


Test whether the differences in the performances of the machinists are signifi- 
cant and also whether the differences in the performances of the four brands of 
machines are significant. Use a = 0.05. 


4. Students were classified into four ability groups, and three different teaching 
methods were employed. The following table gives the mean for four groups: 
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Teaching Method 
Ability 
Group A B Cc 
i 15 19 14 
2 18 17 12 
3 22 25 17 
4 17 21 19 


Test the hypothesis that the teaching methods yield the same results, that is, that 
the teaching methods are equally effective. 


5. The following table shows the yield (pounds per plot) of four varieties of wheat 
obtained with three different kinds of fertilizers. 


Variety of Wheat 
Fertilizer A B Cc D 
a 8 3 6 7 
B 10 4 5 8 
¥ 8 4 6 7 


Test the hypotheses that the four varieties of wheat yield the same average yield 
and that the three fertilizers are equally effective. 


12.6 TWO-WAY ANALYSIS OF VARIANCE WITH INTERACTION 


The model described in Section 12.5 assumes that the two factors act independently, 
that is, are additive. In practice, this is an assumption that needs testing. In this sec- 
tion we allow for the possibility that the two factors might jointly affect the outcome; 
that is, there might be interactions. More precisely, if Y;; is the observation in the 
(i, j)th cell, we will consider the model 


(1) Yij = w+ oj + By + Nj + Eij, 
where aj(i = 1,2,... ,a@) represent row effects (or effects due to factor 1), 6j(j = 
1,2,..., b) represent column effects (or effects due to factor 2), and yj; repre- 


sent interactions or joint effects. We assume that ¢;; are independently distributed 
as N(0, 07). We assume further that 


b 
; ; So vj =0 for alli, 
(2) dia =0= >A; and 7 
i=i j=l 


Vij =O forall j. 
1 


i 
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The hypothesis of interest is 


(3) Ho: yvij =0 for all i, j. 


One may also be interested in testing that all a’s are 0 or that all 8’s are 0 in the 
presence of interactions 7; ;. 


We first note that (2) is not restrictive since we can write 
Vij =e ta; + Bt yi + ej, 
where a), 5 and Vij do not satisfy (2), as 
Yip= Ww +P +B +7 + (aja +7;.-7)+ 8, -B +75 -7) 
HY HH V5 +H) + ei, 
and then (2) is satisfied by choosing 
maw +e BR +7, 
aj =a, -A+7;.-7, 
a= a 
and 


Vij = Vij ~V~ VGA 


Here 
a = 
ed CAN ade OF ae a a ef 
i=l j 
a a 
veea os ¥ij- and y’ = (ab)! ys ye Vij- 
i=] j 


Next note that unless we replicate, that is, take more than one observation per cell, 
there are no degrees of freedom left to estimate the error SS (see Remark 1). 

Let Yijs be the sth observation when the first factor is at the ith level and the 
second factor at the jth level, i = 1,2,...,a, j = 1,2,...,b,5 = 1,2,..., 
m(> 1). Then the model becomes as follows: 
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Level of Factor 2 


Level of 
Factor 1 1 2 ae b 
1 yin yi2i tee Yio1 
Yim Vian Yibm 
2 dit y221 Y2b1 
Y21m Y22m uae Y2bm 
a Yali Ya2\ as Yabl 
Yatm Ya2m ves Yabm 
(4) Yijs = w+ aj + Bj + vj + Eijs, 
i=1,2,...,a,j=1,2,...,b,ands =1,2,...,m, where ¢;;,’s are independent 
b 
N (0,07). We assume that )"7_,0; = 09_,8; = Dhaivy = Chains = 0. 
Suppose that we wish to test Hy: @] = a2 = --: = Gq = O. We leave the reader 


to check that model (4) is then a special case of the general linear hypothesis with 
n =abm,k =ab,r=a—1,andn—k = ab(m — 1). 


Let us write 
eo ee. ce rai Yijs 
(5) Ror eee eS La 
b 
y,.. = [i=l Lisa Ys V,. = Deiat Desai iis 
je = ; , co : 
m am 


Then it can be easily checked that 


(6) 
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It follows from Theorem 12.2.1 that 


yy »; ds Vijs =< Yy- + Yj. .-Y)- Di oy Ys Viis ~ Yij)* 
par pay >» (Yijs eo Yij. y? , 


(7) F= 


>>> > Mus - Vij. + Yi-- -Y) 
iy. s 
= > ys > Wis -Yij.) + ya bese —Y)?, 
ij $s ij s 


we can write (7) as 


bm >; (Vi. — Y)* 


(8) pe ee, 
Yi Lj Vs izs — Yi? 
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Under H, the statistic [ab(m — 1)/(a — 1)]F has the central F(a — 1, ab(m — 1)) 


distribution, so that the likelihood ratio test rejects Hy if 


ab(m—1) mb>~,(¥;..— Y)* ae 
a-l 0, UKs — Vij)? 


A similar analysis holds for testing Hg: Bj = B2 = --- = Bo. 


(9) 


Next consider the test of hypothesis Hy: yj; = 0 for alli, 7, that is, that the two 
factors are independent and the effects are additive. In this case, n = abm, k = ab, 


r = (a — 1)(b— 1), andn — k = ab(m — 1). It can be shown that 


(10) A=¥, &=Y;.-Y, and B,=¥,.-Y. 

Thus 

ay pe LAE y Eas Fen Pj APP ED Eats — Fy? 
Di Ly Vs Nis — Viz)? 

Now 


> Ms —¥;j.. -~¥.;.4+Y) 
ij $s 
= Led dts —¥ij. + ¥ij.-Yi.. -¥.j.+ ¥ 
= 22, n= Yij- aap BR Y;..-Y.j.+Y)’, 
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so that we may write 


Yb) Dy. - Vi. —¥.j.- +? 
eid dais = Yij.)? , 


Under H,,, the statistic {(@m — 1)ab/[(a — 1)(b — 1)]}F has the F((a — 1)(6 — 1), 
ab(m — 1)) distribution. The likelihood ratio test rejects Hy if 


(12) r= 


(m — 1)ab my; Lj Vij. — Yi. —¥.j-+¥) 
(13) ee, 
(a— 16-1) Dei ej Les Vijs — Y;;.)? 


Let us write 


SS; = sum of squares due to factor 1 (row sum of squares) 
a 
=bm (Vi... - YY’, 
i=! 
SS2 = sum of squares due to factor 2 (column sum of squares) 
b — — 
=am )(¥.;.—Y¥)*, 
j=l 
SSI = sum of squares due to interaction 
=m YP. ¥;..-¥.;.+¥), 
i=l j= 
and 
SSE = sum of squares due to error (residual sum of squares) 
= YY Son -F are Yi;.) 
i=] j=] s=1 
Then we may summarize the foregoing results in the following table. 


Two-Way Analysis of Variance Table with Interaction 


Source of Sum of Degrees of 

Variation Squares Freedom Mean Square F-Ratio 
Rows SS; a-1l MS, = SS;/(a — 1) MS, /MSE 
Columns SS, b-1 MS, = SS2/(b — 1) MS,/MSE 
Interaction SSI (a@—1)(b-—1) MSI=SSI/(a—1)(6—1) MSI/MSE 
Error SSE ab(m — 1) MSE = SSE/ab(m — 1) 

Mean abmX" 1 abmX 


m 


Total > x base jjs abm > ~ y Y;,/abm 


i=] j=l s=1 is} j=1 s=1 
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Remark I. Note that if m = 1, there are no d_f.’s associated with the SSE. In- 
deed, SSE = 0 if m = 1. Hence we cannot make tests of hypotheses when m = I, 
and for this reason we assume that m > 1. 


Example I. To test the effectiveness of three different teaching methods, three 
instructors were randomly assigned 12 students each. The students were then ran- 
domly assigned to the different teaching methods and were taught exactly the same 
material. At the conclusion of the experiment, identical examinations were given to 
the students with the following results in regard to grades: 


Instructor 
Teaching 

Method I Il Tht 
1 95 60 86 

85 90 77 

74 80 75 

74 70 70 

2 90 89 83 

80 90 70 

92 91 715 

82 86 72 

3 70 68 74 

80 73 86 

85 78 91 

85 93 89 


Then 


SS; = sum of squares due to methods 
a 
= bm YG. —y) 
| 


=3x4x 14.13 = 169.56, 
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SS2 = sum of squares due to instructors 


b 
=am )\.j.- 9" 


j=l 
=3x 4x 6.86 = 82.32, 


SSI = sum of squares due to interaction 


=m PYG. y, Vie — Vey + 


i=| j=] 
= 4x 140.45 = 561.80, 


and 


SSE = residual sum of squares 


LINEAR HYPOTHESIS 


3 
= >» You — ¥;;-)? = 1830.00, 


i ix] j = s=l1 
Analysis of Variance 
Source SS df. MSS F-Ratio 
Methods 169.56 2 84.78 1.25 
Instructors 82.32 2 41.16 0.61 
Interactions 561.80 4 140.45 2.07 
Error 1830.00 27 67.78 


With a = 0.05, we see from the tables that F2,27,9.05 = 


3.35 and F4,27,0.05 = 


2.73, so that we cannot reject any of the three hypotheses that the three methods 
are equally effective, that the three instructors are equally effective, and that the 


interactions are all 0. 


PROBLEMS 12.6 


. 1. Prove statement (6). 


2. Obtain the likelihood ratio test of the null hypothesis Hg: 8} = B2 = 


Bp = 0 


3. Prove statement (10). 


i 


4. Suppose that the following data represent the units of production turned out each 
day by three different machinists, each working on the same machine for three 


different days: 


TWO-WAY ANALYSIS OF VARIANCE WITH INTERACTION 


Machinist 
Machine A B Cc 
B, 15, 15,17 19, 19, 16 16, 18, 21 
By 17, 17, 17 15, 15, 15 19, 22, 22 
B; 15, 17, 16 18, 17, 16 18, 18, 18 
Bs 18, 20, 22 15, 16, 17 17, 17,17 


597 


Using a 0.05 level of significance, test whether (a) the differences among the ma- 
chinists are significant, (b) the differences among the machines are significant, 
and (c) the interactions are significant. 


5. In an experiment to determine whether four different makes of automobiles av- 
erage the same gasoline mileage, a random sample of two cars of each make was 
taken from each of four cities. Each car was then test run on 5 gallons of gasoline 
of the same brand. The following table gives the number of miles traveled. 


Automobile Make 
City A B c 
Cleveland 92.3, 104.1 90.4, 103.8 110.2, 115.0 
Detroit 96.2, 98.6 91.8, 100.4 112.3, 111.7 
San Francisco 90.8, 96.2 90.3, 89.1 107.2, 103.8 
Denver 98.5, 97.3 96.8, 98.8 115.2, 110.2 


D 


120.0, 125.4 
124.1, 121.1 
118.4, 115.6 
126.2, 120.4 


Construct the analysis of variance table. Test the hypothesis of no automobile 
effect, no city effect, and no interactions. Use a = 0.05. 


CHAPTER 13 


Nonparametric Statistical Inference 


13.1. INTRODUCTION 


In all the problems of statistical inference considered so far, we assumed that the 
distribution of the random variable being sampled is known except, perhaps, for 
some parameters. In practice, however, the functional form of the distribution is sel- 
dom, if ever, known. It is therefore desirable to devise methods that are free of this 
assumption concerning distribution. In this chapter we study some procedures that 
are commonly referred to as distribution-free or nonparametric methods. The term - 
distribution-free refers to the fact that no assumptions are made about the under- 
lying distribution except that the distribution function being sampled is absolutely 
continuous. The term nonparametric refers to the fact that there are no parameters 
involved in the traditional sense of the term parameter used thus far. To be sure, 
there is a parameter that indexes the family of absolutely continuous DFs, but it is 
not numerical, and hence the parameter set cannot be represented as a subset of Ry, 
for any n > 1. The restriction to absolutely continuous distribution functions is a 
simplifying assumption that allows us to use the probability integral transformation 
(Theorem 5.3.1) and the fact that ties occur with probability 0. 

Section 13.2 is devoted to the problem of unbiased (nonparametric) estimation. 
We develop the theory of U-statistics since many estimators and test statistics may 
be viewed as U-statistics. Sections 13.3 through 13.5 deal with some common 
hypothesis-testing problems. In Section 13.6 we investigate applications of order 
statistics in nonparametric methods. Section 13.7 considers underlying assumptions — 
in some common parametric problems and the effect of relaxing these assumptions. 


13.2. U-STATISTICS 


In Chapter 7 we encountered several nonparametric estimators. For example, the em- 
pirical DF defined in Section 7.3 as an estimator of the population DF is distribution 
free, and so also are the sample moments as estimators of the population moments. 
These are examples of what are known as U-statistics, which lead to unbiased esti- 
mators of population characteristics. In this section we study the general theory of 
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U-statistics. Although the thrust of this investigation is unbiased estimation, many 
of the U-statistics defined in this section may be used as test statistics. 

Let X,, X2,... , Xn be iid RVs with common law £(X), and let P be the class of 
all possible distributions of X that consists of the absolutely continuous or discrete 
distributions, or subclasses of these. 


Definition 1. A statistic T (X) is sufficient for the family of distributions P if the 
conditional distribution of X, given T = f, is the same whatever the true F € P. 


Example I. Let X;, X2,... , Xn be arandom sample from an absolutely contin- 
uous DF, and let T = (Xq1), ... , X(n)) be the order statistic. Then 


f«|T=j=@)7, 


and we see that T is sufficient for the family of absolutely continuous distributions 
on R. 


Definition 2. A family of distributions P is complete if the only unbiased esti- 
mator of 0 is the zero function itself, that is, 


Erh(®)=0 forall Fe P > h(x) =0 
for all x (except for a null set with respect to each F € P). 


Definition 3. A statistic T(X) is said to be complete in relation to a class of 
distributions P if the class of induced distributions of T is complete. 


We have already encountered many examples of complete statistics or complete 
families of distributions in Chapter 8. 


The following result is stated without proof. For the proof we refer to Fraser [29, 
pp. 27-30, 139-142]. 


Theorem 1. The order statistic (X(1), X(2),... . X~)) is a complete sufficient 
statistic provided that the iid RVs X1, X2,... , Xn are of either the discrete or con- 
tinuous type. 


Definition 4. A real-valued parameter g(F) is said to be estimable if it has an 
unbiased estimator, that is, if there exists a statistic T(X) such that 


(4) ErT(X) = g(F) for all F € P. 


Example 2. \f P is the class of all distributions for which the second moment 
exists, X is an unbiased estimator of (F), the population mean. Similarly, w2(F) = 
var(X) is also estimable, and an unbiased estimator is S? = )77(X; — X)?/(n — 
1). We would like to know whether X and S? are UMVUEs. Similarly, F(x) and 
Pr(X, + X2 > 0) are estimable for F € P. 
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Definition 5. The degree m (m > 1) of an estimable parameter g(F) is the small- 
est sample size for which the parameter is estimable; that is, it is the smallest m such 
that there exists an unbiased estimator T(X,, X2,... , Xm) with 


ErT = g(F) for all F ¢ P. 


Example 3. The parameter g(F) = Pr{X > c}, where c is a known constant, 
has degree 1. Also, u(F) is estimable with degree 1 [we assume that there is at least 
one F € P such that u(F) 4 0), and y22(F) is estimable with degree m = 2, 
since 42(F) cannot be estimated (unbiasedly) by one observation only. At least two 
observations are needed. Similarly, w2(F ) has degree 2, and P(X; + X2 > 0) also 
is of degree 2. 


Definition 6. An unbiased estimator of a parameter based on the smallest sample 
size (equal to degree m) is called a kernel. 


Example 4. Clearly, X; 1 <i < nis a kernel of u(F); T(X;) = 1, if X; > c, 
and = O0if X; < cisakernal of P(X > c). Similarly, T(X;, Xj) = lif X;+Xj; > 0, 
and = 0 otherwise is a kernel of P(X; + X; > 0), X; Xj is a kernel of p?(F) and 
x? — X;Xj is a kernel of 2(F). 


Lemma 1. There exists a symmetric kernel for every estimable parameter. 


Proof. Vf T(X1, X2,..- , Xm) is a kernel of g(F), so also is 
1 
(2) Ts(X1, Xa,... Xm) = DT, TEE Ce 


where the summation P is over all m! permutations of {1, 2,... , m}. 
Example 5. A symmetric kernel for j22(F) is 
T,(Xi, Xj) = 4{T(Xi, Xj) + T(X;j, Xi} 
=4(x;-X))?, fj =1,2,....n@# J). 
Definition 7. Let g(F) be an estimable parameter of degree m, and let Xj, X2, 


...,X, be a sample of size n, n > m. Corresponding to any kernel T(Xi,,... , Xin) 
of g(F), we define a U-statistic for the sample by 


-1 
(3) U(X), X2, ose gy Xn) = & Yo T(Xiy, as > Xig)s 
Cc 


where the summation C is over all ( ) combinations of m integers (i), i2,... , im) 
m 


chosen from {1,2,... ,}, and 7, is the symmetric kernel defined in (2). 
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Clearly, the U-statistic defined in (3) is symmetric in the X;’s, and 
(4) ErU(X) = g(F) for all F. 


Moreover, U(X) is a function of the complete sufficient statistic X (1), X(2), ... , Xn)- 
It follows from Theorem 8.4.6 that it is UMVUE of its expected value. 


Example 6. For estimating (F), the U-statistic is n=! >-} Xj. For estimating 
442(F), a symmetric kernel is 


Ts(Xiy, Xiz) = 3(Xi, — XH), i = 1,2,..., 2G FH), 


so that the corresponding U-statistic is 


vx) =(") PRE (Xn — Xin)? 


iy = 2 


es xy 


= S*, 
Similarly, for estimating p(F ), a symmetric kernel is 7;(X;,, Xi.) = Xi, Xi,, and 


the corresponding U-statistic is 


i<j 


U(X) = (’) DeXiX; = aay 
) 


For estimating (PF), a symmetric kernel is T;(Xi,, Xi,, Xi;) = Xi, Xi, Xi,, 80 
that the corresponding U-statistic is 


1 
ve=(") YO XXX = Woe D> XiXjXe. 


i<j<k ifj#k 


For estimating F(x) a symmetric kernel is Ix,;<x], So the corresponding U- 
statistic is 


] n 
U(X) = rs Wires = F(x), 
i=! 
and for estimating P(X > 0) the U-statistic is 


1 n 
U(X) = — Yo Ax;>0) = 1 — FI). 
i=] 
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Finally, for estimating P(X, + X2 > 0) the U-statistic is 


i<j 


U(X) = () Yo hx+x;>0- 
; 


Theorem 2. The variance of the U-statistic defined in (3) is given by 


5) var U(X) = c ys ie ena 


nee 
i 


m 
where 

bo = cove [Ts (Xi, +.» + Xig) » Ts (Xjs---  Xin)} 
with m, the degree of g(F), and c is the common number of integers in the sets 
{i1,... im} and {j1,..., jm}. [For c = 0, the two statistics T(X;,,... , Xi,) and 
T(Xj,,..- , Xj,) are independent and have zero covariance.] 


Proof. Clearly, 


ae, 

= YO Er [ {ls (Xi. --- Xin) ~ 8CF)} {Ts (Xin --- » Xin) — BF )H]- 
1G) 

Let c be the number of common integers in {i), i2,..., im} and {jo, jo,.--. Jm}- 

Then c takes values 0, 1,...,m and forc = 0, T;(Xi,,... , Xi,,) and T;(Xj,,..., 


X jn) are independent. It follows that 


(6) var U(X) = (lias al IC —e 


which is (5). The counting argument from (6) to (7) is as follows: First we select 
integers {i),...,im} from {1,2,...,n}in C ways. Next we select the integers in 


{j1,.-- » jm}. This is done by selecting first the c integers that will be in (i), ... , im} 
(hence common to both sets) and then the m — c integers from n — m integers which 
will not be {j1,... , jm}. Note that 9 = 0 from independence. 


Example 7. Consider the U-statistic aul X of g(F ) = w(F) in Example 6. 
Here m = 1, T(x) = x, and ¢ = var(X)) = o” so that var(X) = o2/n. 

For the parameter g(F) = 2(F), U(X) = S?. In this case, m = 2, T;(Xi,, Xi.) = 
(Xi, _ Xin)?/2, so 
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1 
var U(X) = ——(2(n — 2)&) + 0}, 


(;) 


1 4 u4toat 
Q= Er E (Xi, — Xi,) a = aoe 


where 


and 
1 ame 
¢, = cov [3 (Xi, — Xj,) 3 (Xi, ~ Xj) |. 


where i2 # jo. Then 


and 


ss __ 74 4 
MeO Sane) Se E 2) (U4 aad 


n(n —1) 2 2 


-( n—3 ‘) 
=— | M4 — o }> 
n n—1 


which agrees with Corollary 2 to Theorem 7.3.5. 


For the parameter g(F) = F(x), var U(X) = F(x)(1— F(x))/n, and for g(F) = 
Pr(X; + X2 > 9), 


1 
var U(X) = ma=pe — 2)e, + 20), 


where 
1 = Pr(X, + X2 > 0, X1 + X3 > 0) — P2(X, + X2 > 0) 
and 
to = Pr(X1 + X2 > 0) ~ Pz(X1 + Xz > 0) 


= Pr(X; + X2 > O)Pr(X1 + X2 <0). 


Corollary to Theorem 2. Let U be the U-statistic for a symmetric kernel T,(X1, X2, 
... 5 Xm). Suppose that Er [7,(X1,..., Xm)? < oo. Then 


(7) wim {7 var U(X) = mC. 
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Proof. itis easily shown that 0 < ¢. < Gp for 1 < c < m. It follows from the 
hypothesis ¢,, == var[T,(X1,... , Xm)]* < 00 and (5) that var U(X) < co. Now 


oles 
. c]\m—ce = (m!)2n [@ — m)!/? 


(") be = c![(m —c)!]2 n!(n — 2m + c)!* 


. mt? (n= m)(n—m—1)---(n— Im teF1) 
~ Clim — oy! n(n—1)---(n—m+1) @ 


Note that the numerator has m — c + 1 factors involving n, while the denominator 
has m such factors so that for c > 1, the ratio involving n goes to zero as n —> 00. 
For c = 1, this ratio — 1 and 


(m!)? 


meee: 
jasper o 


n var U(X) —> 
asin — oo. 


Example 8. In Example 7, n var(X) = o7 and 


n var(S) — 2 e; = h4- ot 


asin — Oo. 


Finally, we state, without proof, the following result due to Hoeffding [42], which 
establishes the asymptotic normality of a suitably centered and normed U-statistic. 
For proof we refer to Lehmann [59, pp. 364-365] or Randles and Wolfe [83, p. 82]. 


Theorem 3. Let X;, X2,... , X, be arandom sample from a DF F and let g(F) 
be an estimable parameter of degree m with symmetric kernel T;(X 1, X2,..., Xm). 
If Er {T;(X1, X2,..., Xm)}2 < co and U is the U-statistic for g [as defined in 


(3)), then /a(U (X) — g(F)) > NO, m2¢1), provided that 


b = cove {Ts (Xi,--- Xin)» Ts (Xjis +++ » Xin) | > 0. 


In view of the corollary to Theorem 2, it follows that [U —g(F)]/./var(U) —> N'(0, | 
provided that ¢; > 0. 


Example 9. (Example 7 continued). Clearly, /n(X — )/o > N(0, 1) asn > 
oo since (; =a? > 0. 
For the parameter g(F) = 2(F), 


1 —3 _ x4 
var U(X) = var(S*) = = (ug —* vot), oe b4~o a 
n = —— 
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so it follows from Theorem 3 that 
Jn(S2 — 02) > NO, ut — 04). 


The concept of U-statistic can be extended to multiple random samples. We will 
restrict ourselves to the case of two samples. Let X;, X2,..., Xn, and Y}, Yo,..., Yn, 
be two independent random samples from DFs F and G, respectively. 


Definition 8. A parameter g(F, G) is estimable of degrees (m,, m2) if m, and m2 
are the smallest sample sizes for which there exists a statistic T(X1,... , Xm,3 1, 
. + my) such that 


(8) Ero (X1,....Xmi Yie--. + Ying) = 8(F. G) 
for all F, G € P. 


The statistic T in Definition 8 is called a kernel of g and a symmetrized version 
of T, T;, is called a symmetric kernel of g. Without loss of generality, therefore, we 
assume that the two-sample kernel T in (9) is a symmetric kernel. 


Definition 9. Let g(F,G), F,G ¢€ P be an estimable parameter of degree 
(m,, m2). Then a (two-sample) U-statistic estimate of g is defined by 


+1 -1 
(9) vay =("') ) DDD ed Comers (mes ¢ peerae sp 


1 icA jeB 


where A and B are collections of all subsets of m and m2 integers chosen without 
replacement from the sets {1, 2,... , 2} and {1,2,... , #2}, respectively. 


Example 10. Let X,, X2,...,Xn, and Yi, Y2,..., Yn, be two independent 
samples from DFs F and G respectively. Let 


e(F,G) = P(X <¥) = / F(x)g(x)dx = / P(Y > y)f() dy, 
—0o —00 


where f and g are the respective PDFs of F and G. Then 


py fl ifXi <Y; 
TORY) = {9 if X; > Yj 


is an unbiased estimator of g. Clearly, g has degree (1,1) and the two-sample 
U-statistic is given by 


ny nz 


U(X; Y) = = yy POY): 


i=l j=l 
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Theorem 4. The variance of the two-sample U-statistic defined in (9) is given by 


ee, 1 ch 2 (my \ (ny — my\ (m2\ (nz — m2 
var U(X; Y) = (=) Dye c sea i) 69 | ree) 


m,)] \m2 
(10) 
where ¢,q is the covariance between T(Xj,,... + Xin, i Vis +++» Vin, ) and T (Xk, 
ee Xkiny 3 Yepseres Yen, ) with exactly c X’s and d Y’s in common. 
Corollary. Suppose that Er.gT7(X},... >Xm,iV1,...,¥m,) < oo for all 


F,G € P. Let N = nj +n2 and suppose that nj, n2, N — oo such thatn;/N —> dA, 
n2/N — 1—. Then 


2 


m Vv < = + . 
| 1 a ) 1,0 1 ) 0,1 


The proofs of Theorem 4 and its corollary parallel those of Theorem 2 and its 
corollary and are left to the reader. 


Example 11. For the U-statistic in Example 10, 


1 
Er.cU™(&Y) = 5 DDI DD Era {TOG YT Xu YO}. 
ae 
Now 


Erg {T(Xi; ¥j)T (Xu; Ye)} = P(X < Yj, Xe < %) 


feng F g(x) dx fori =k, j =1, 
[S01 - GOP f@)dx fori=k, j #1, 
~ | f°, F2x)e(x) dx fori #k, j=l, 
[fo Fexde(x) dx]? fori #k, j #1, 


where f and g are PDFs of F and G, respectively. Moreover, 
ss 2 
$10 = | [1 — Gax)P f@) dx —[9(F, G)] 
—00 
and 


to1 = / F(X) g(x)] dx — [g(F, G)P. 
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It follows that 


1 
var U(X; Y) = ane {e(F, GI — g(F, G)} + (m1 — Do1,0 + (2 — Door}. 


II 
I 
2 
=] 
Qa 


In the special case when F = G, g(F,G) = 4,10 = 41 = 4-4 
var(U) = (my + n2 + 1)/C2nin2). 


Finally, we state, without proof, the two-sample analog of Theorem 3, which es- 
tablishes the asymptotic normality of the two-sample U-statistic defined in (9). 


Theorem 5, Let X1, X2,..., Xn, and ¥1, Y2,... , Yn, be independent random 
samples from DFs F and G, respectively, and let g(F', G) be an estimable parameter 
of degree (71, m2). Let T(X1,... , Xm,3 Y1,--- , Ym.) be a symmetric kernel for g 
such that ET? < 00. Then 


Jay +72 (U(X; Y) — 2(F, G)} > NO, 0), 


where o? = mit1,0/A + m3t0,1/(1 — A), provided that o2 > O,andO <A = 
limy-+o0(m1/N) =A <1,N =n, +72. 


In view of (12), we see that (U — g)//varU an N (0, 1), provided that o2>0. 


For a proof of Theorem 5 we refer to Lehmann [59, p. 364], or Randles and 
Wolfe [83, p. 92]. 


Example 11. (Continued). In Example 11 we saw that in the special case when 
F=G6,%0=%1 = tb and var U = (nj +n2 + 1)/(12njn2). It follows from the 
remark following Theorem 5 that 


U(X; Y) — 3 


cise a Se AY, 
J(ny +72 + 1)/(12nin2) 


PROBLEMS 13.2 


1. Let (R, B, Pg) be a probability space, and let P = {P9: 6 € O}. Let A bea 
Borel subset of R, and consider the parameter d(9) = Po(A). Is d estimable? 
If so, what is the degree? Find the UMVUE for d, based on a sample of size n, 
assuming that P is the class of ali continuous distributions. 


2. Let X1, X2,...,Xm and Yj, Y2,..., Y, be independent random samples from 
two absolutely continuous DFs. Find the UMVUEs of (a) E(XY), and (b) 
var(X + Y). 


3. Let (X1, ¥1), (X2, Y2),-.., (Xn, Yn) be a random sample from an absolutely 
continuous distribution. Find the UMVUEs of (a) E(XY) and (b) var(X + Y). 
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4. Let T(X, X2,..., Xn) be a statistic that is symmetric in the observations. 
Show that T can be written as a function of the order statistic. Conversely, 
if T(X1, X2,... , Xn) can be written as a function of the order statistic, T is 


symmetric in the observations. 


5. Let X1, X2,... , X, be a random sample from an absolutely continuous DF F, 
F é P. Find U-statistics for g1(F) = u3(F) and 22(F) = u3(F). Find the 
corresponding expressions for the variance of the U-statistic in each case. 


6. In Example 3, show that 422(F) is not estimable with one observation. That is, 
show that the degree of 42(F) where F e€ P, the class of all distributions with 
finite second moment, is two. 


7. Show that forc = 1,2,...,m,0 << {m. 
8. Let X1, X2,..., X» be a random sample from an absolutely continuous DF F, 
Fe P. Let 
g(F) = Ep|X, — Xl. 


Find the U-statistic estimator of g(F) and its variance. 


13.3 SOME SINGLE-SAMPLE PROBLEMS 


Let X,, X2,... , X, be a random sample from a DF F-. In Section 13.2 we studied 
properties of U-statistics as nonparametric estimators of parameters g(F). In this 
section we consider some nonparametric tests of hypotheses. Often, the test statistic 
may be viewed as a function of a U-statistic. 


13.3.1 Goodness-of-Fit Problem 


The problem of fit is to test the hypothesis that the sample comes from a specified 
DF Fo against the alternative that it is from some other DF F, where F(x) 4 Fo(x) 
for some x € R. In Section 10.3 we studied the chi-square test of goodness of fit for 
testing Ho: X; ~ Fo. Here we consider the Kolmogorov-Smirmov test of Ho. Since 
Ho concerns the underlying DF of the X’s, it is natural to compare the U-statistic 
estimator of ¢(F) = F(x) with the specified DF Fo under Ho. The U-statistic for 
g(F) = F(x) is the empirical DF F/*(x). 


Definition 1, Let X1, X2,... , X, be a sample from a DF F, and let Fy be a 
corresponding empirical DF. The statistic 


(1) Dy = sup [FX (x) — F(x)| 


is called the (two-sided) Kolmogorov—Smirnov statistic. We write 
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(2) Dj = sup[ F7 (x) — F(x)] 
and 
(3) D; = sup[F(x) — F*(x)], 


and call Dj, D7 the one-sided Kolmogorov-Smirnov statistics. 


Theorem 1. The statistics D,, D7, D7 are distribution-free for any continuous 
DF F. 


Proof. Clearly, Dn = max(D+, Dy). Let Xi) < X(2) Sees Xn) be the 
order statistics of X,, X2,... , Xn, and define X(9) = —00, X(441) = +00. Then 


i 
F¥(x) = - for Xj) <x < XG41), i=0,1,2,...,n, 
n 
and we have 


Dt = max sup { = Fi} 


Osis" XGy<x<Xury 
i : 
== max %{-— — inf F(x) 
O<i<n [Nn X@msx<Xisy 


i 
= max {- _ Fxo)} 


O<i<n 


= max 7 max E — Fx@)| .o| : 
l<i<n|] n 

Since F(X) is the ith-order statistic of a sample from U (0, 1) irrespective of what 

F is, as long as it is continuous, we see that the distribution of D+ is independent of 

F, Similarly, 


tes i-1 
D, = max { max [Few - =| 0} , 
and the result follows. 


Without loss of generality, therefore, we assume that F is the DF of a U(0, 1) RV. 


Theorem 2. If F is continuous, then 
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1 

(4) PyDn<vu+— 
2n 


v+(1/2n) pv+(3/2n) 
( -v a —v 


_ v+[(2n—-1)/2n] i n—1 
»U2,..-5 . du; ifO ; 
I Sf (uy, u2 Un) I] uj <vu< on 


ifu <0, 


(Qn~1)/2n]~v 
2n—1 
1 ifv> ——, 
2n 
where 
n!, 0<uy<--- <u, <1, 
(5) f(t, u2,... ,Un) = ‘i 


0, otherwise, 


is the joint PDF of the set of order statistics for a sample of size n from U (0, 1). 


We will not prove this result here. Let D, « be the upper w-percent point of the 
distribution of D,, that is, P{D, > Dn} < a. The exact distribution of D, for se- 
lected values of n and @ has been tabulated by Miller [72], Owen [77], and Birnbaum 
[8]. The large-sample distribution of D, was derived by Kolmogorov [51], and we 
state it without proof. 


Theorem 3. Let F be any continuous DF. Then for every z > 0, 


(6) tim, P{Dn < zn7'/7} = L(2), 
where 

2 : 22 
(7) LQ = 1-250 Te z . 


i=l 


Theorem 3 can be used to find dy such that litty+o9 P{./n Dn < dy} = 1 — a. 
Tables of dy for various values of @ are also available in Owen [77]. 

The statistics D;+ and D> have the same distribution because of symmetry, and 
their common distribution is given by the following theorem. 


Theorem 4. Let F be a continuous DF. Then 
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/’ ok: 1)/n}—z ee —2z 


xf fu, ua,... tn) Tt if0<z<1, 
(i/n)—z 
1 ifz>1, 


ifz <0, 


(8) P{Di <z= 


where f is given by (5). 


We leave the reader to prove Theorem 4. 
Tables for the critical values Dt a where P(D* > Dj 4) < @, are also available 
for selected values of n and aw (see Bimbaum and Tingey [7]). Table ST7 gives Dt e 
and Dn for some selected values of n and a. For large samples, Smirnov [106] 
showed that 
2 


(9) lim P{Vn Dy <z}=1-e, z>0. 
n> 


In fact, in view of (9), the statistic V, = 4nD* 2 has a limiting x7(2) distribution, for 
4nD*? < 42? if and only if /n Dt < z, z > 0, and the result follows since 


lim P{V, $427}=1-e%, 20, 
n 
so that 
lim P{Vn <x} = 1 merle x>0, 
n—> 


which is the DF of a x7(2) RV. 


Example 1. Let o = 0.01, and fet us approximate D7 ,. We have x2 00) = 9-21. 
Thus V, = 9.21, yielding 


9.21 3.03 
4n  2/n' 


If, for example, n = 9, then Dy 0.01 = 3-03/6 = 0.50. Of course, the approximation 
is better for large n. 


+ 
Di0.01 = 


The statistic D, and its one-sided analogs can be used in testing Hy: X ~ Fo 
against H;: X ~ F, where Fo(x) # F(x) for some x. 


Definition 2. To test Ho: F(x) = Fo(x) for all x at level a, the Kolmogorov— 
Smirnov test rejects Ho if Dy > Dy.q. Similarly, it rejects F(x) > Fo(x) for all x if 
D, > Di, and rejects F(x) < Fo(x) for all x at level a if Dit > Dr. 


612 NONPARAMETRIC STATISTICAL INFERENCE 


For large samples we can approximate by using Theorem 3 or (9) to obtain an 
approximate a-level test. 


Example 2. Let us consider the data in Example 10.3.3, and apply the Kolmogorov— 
Smirnov test to determine the goodness of the fit. Rearranging the data in increasing 
order of magnitude, we have the following result: 


x Fo(x) Fy(x) i/20 — Fo(xq) Fo(x@) — @ — 1)/20 
—1.787 0.0367 + 0.0133 0.0367 
—1.229 0.1093 z —0.0093 0.0593 
—0.525 0.2998 x —0.1498 0.1998 
—0.513 0.3050 a 0.1050 0.1550 
—0.508 0.3050 4 —0.0550 0.1050 
—0.486 0.3121 £ -0.0121 0.0621 
—0.482 0.3156 % 0.0344 0.0156 
—0,323 0.3745 s 0.0255 0.0245 
—0.261 0.3974 x 0.0526 —0,0026 
—0.068 0.4721 x 0.0279 0.0221 
—0.057 0.4761 a 0.0739 —0.0239 

0.137 0.5557 2 0.0443 0.0057 
0.464 0.6772 3 —0.0272 0.0772 
0.595 0.7257 a —0.0257 0.0757 
0.881 0.8106 oa —0.0606 0.1106 
0.906 0.8186 a —0.0186 0.0686 
1.046 0.8531 g ~0.0031 0.0531 
1.237 0.8925 3 0.0075 0.0425 
1.678 0.9535 R —0.0035 0.0535 
2.455 0.9931 1 0.0069 0.0431 


From Theorem 1, 
Djy = 0.1998, D5. = 0.0739, and Do = max(D3p, Dy) = 0.1998. 


Let us take a = 0.05. Then D29,0.05 = 0.294. Since 0.1998 < 0.294, we accept Ho 
at the 0.05 level of significance. 


It is worthwhile to compare the chi-square test of goodness of fit and the 
Kolmogorov-Smirnov test. The latter treats individual observations directly, whereas 
the former discretizes the data and sometimes loses information through grouping. 
Moreover, the Kolmogorov-Smirnov test is applicable even in the case of very small 
samples, but the chi-square test is essentially for large samples. 
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The chi-square test can be applied when the data are discrete or continuous, but 
the Kolmogorov—Smirnov test assumes continuity of the DE. This means that the 
latter test provides a more refined analysis of the data. If the distribution is actually 
discontinuous, the Kolmogorov—Smirmov test is conservative in that it favors Ho. 

We next turn our attention to some other uses of the Kolmogorov—Smirmov statis- 
tic. Let X;, X2,... , Xn be asample from a DF F, and let F* be the sample DF. The 
estimate F* of F for large n should be close to F. Indeed, 


re ei-g 
Ti > 


(10) P {Fg — Feo = 52 


and since F(x)[1 — F(x)] < 3, we have 


: A 1 
(1) P {iFgts) - Fons seh > 1-5. 


Thus F* can be made close to F with high probability by choosing 2 and large 
enough n. The Kolmogorov—Smirnov statistic enables us to determine the smallest n 
such that the error in estimation never exceeds a fixed value ¢ with a large probability 
1 — a. Since 


(12) P{D, < e} = 1l—-a, 


€ = Dnq; and given € and @, we can read n from the tables. For large n we can use 
the asymptotic distribution of D,, and solve dy = ¢./n for n. 

We can also form confidence bounds for F’. Given a and n, we first find Dy, such 
that 


(13) P{D, > Daa} < @, 


which is the same as 


P {sup Fg ~ F(x) < Dra} > 1a. 
x 


Thus 

(14) P{\FX(x) — F(x)| < Dag = forallx} >1—a. 
Define 

(15) Ln(x) = max{ F(x) — Dna, 9} 

and 

(16) Un(x) = min{ F(x) + Dna. 1). 


Then the region between L,,(x) and U,,(x) can be used as a confidence band for F (x) 
with associated confidence coefficient 1 — a. 
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Example 3. For the data on the standard norma! distribution of Example 2, Jet us 
form a 0.90 confidence band for the DF. We have D29,9.190 = 0.265. The confidence 
band is, therefore, Fy (x) + 0.265 as long as the band is between 0 and 1. 


13.3.2 Problem of Location 


Let X;, X2,... , X, be a sample of size n from some unknown DF F. Let p be a 
positive real number, 0 < p < 1, and let 3,(F) denote the quantile of order p for 
the DF F. In the following analysis we assume that F is absolutely continuous. The 
problem of location is to test Hp: 3p(F') = 30, 30 a given number, against one of 
the alternatives 3p(F) > 30, 3p < 30, and 3p # go. The problem of location and 
symmetry is to test Hy: 30.5(F) = 30, and F is symmetric against H|: 30.5(F) 4 30 
or F is not symmetric. 
We consider two tests of location. First, we describe the sign test. 


Sign Test 
Let X1, X2,..., Xn be iid RVs with common PDF f. Consider the hypothesis- 
testing problem 


(17) Ho: 3p(f) = 30 against Hj: 3p(f) > 30, 


where 3(f) is the quantile of order p of PDF f,0 < p < 1. Let g(F) = P(X; > 
30) = P(X; — 30 > 0). Then the corresponding U-statistic is given by 


nU(X) = R*(X), 


the number of positive elements in X; — 39, X2 —30,--- , Xn — 30. Clearly, P(X; = 
30) = 0. Fraser [29, pp. 167-170] has shown that a UMP test of Ho against H; is 
given by 


1, Rt(x) >; 
(18) g(x) = 7, R+(x) =c, 
0, Rt (x) <c, 


where c and y are chosen from the size restriction 


(9) ee > ( n Ja = p)R*@) pr-R*O0) re ("Ja — py pt. 


os asl 


Note that under Ho, 3p(f) = go, so that Px,(X < 30) = p, and R+(X) ~ b(n, 1 — 
p). The same test is UMP for Ho: 3p(f) < 30 against Hy: 3p(f) > go. For the two- 
sided case, Fraser [29, p. 171] shows that the two-sided sign test is UMP unbiased. 
If, in particular, 39 is the median of f, then p = 5 under Ho. In this case one can 
also use the sign test to test Hp: med(X) = 30, F is symmetric. 
For large n one can use the normal approximation to binomial to find c and y in 
(19). 
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Example 4. Entering college freshmen have taken a particular high school 
achievement test for many years, and the upper quartile (p = 0.75) is well es- 
tablished at a score of 195. A particular high school sent 12 of its graduates to 
college, where they took the examination and obtained scores of 203, 168, 187, 
235, 197, 163, 214, 233, 179, 185, 197, 216. Let us test the null hypothesis Ho that 
30.75 < 195 against H): 30.75 > 195 at the a = 0.05 level. 

We have to find c and y such that 


LOMO HG) @" =o 


From the table of cumulative binomial distribution (Table ST1) for n = 12, p = is 
we see that c = 6. Then y is given by 


VINE 3N° 
0.0142 + r( ) (3) (3) = 0.05. 


Thus 
0.0358 
= ——— = 0.89. 
¥ = 0.0402 
In our case the number of positive signs, x; — 195, i = 1,2,... , 12, is 7, so we 


reject Ho that the upper quartile is < 195. 


Example 5. A random sample of size.8 is taken from a normal population with 
mean 0 and variance 1. The sample values are —0.465, 0.120, —0.238, —0.869, 
—1,016, 0.417, 0.056, 0.561. Let us test hypothesis Ho: 2 = —1.0 against H,: up > 
~—1.0. We should expect to reject Hp since we know that it is false. The number of 
observations, x; — 49 = x; + 1.0, that are > 0 is 7. We have to find c and y such that 


ECG) +0) GY -005 


> (;) a (") = 12.8. 


We see that c = 6 and y = 0.13. Since the number of positive x; — up is > 6, we 
reject Ho. 
Let us now apply the parametric test here. We have 


that is, 
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Since o = 1, we reject Ho if 
: Zq = —1.0+4+ : 1.64 
vn nee ad 
= —0.42. 


X> pot 


Since —0.179 > —0.42, we reject Ho. 


The single-sample sign test described above can easily be modified to apply to 
sampling from a bivariate population. Let (X;, ¥1), (X2, Y2),... , (Xn, Yn) be aran- 
dom sample from a bivariate population. Let Z; = X; — Y;,i = 1,2,...,n, and 
assume that Z; has an absolutely continuous DF. Then one can test hypotheses con- 
cerning the order parameters of Z by using the sign test. A hypothesis of interest 
here is that Z has a given median 39. Without loss of generality, let 39 = 0. Then 
Ho: med(Z) = 0; that is, P{Z > O} = P{Z < O} = 5. Note that med(Z) 1s not 
necessarily equal to med(X) — med(Y), so that Ho is not that med(X) = med(Y) 
but that med(Z) = 0. The sign test is UMP against one-sided alternatives and UMP 
unbiased against two-sided alternatives. 


Example 6. We consider an example due to Hahn and Nelson [37], in which two 
measuring devices take readings on each of 10 test units. Let X and Y, respectively, 
be the readings on a test unit by the first and second measuring devices. Let X = 
A+e, ¥ = A+ 62, where A, &1, €2, respectively, are the contributions to the 
readings due to the test unit and to the first and second measuring devices. Let A, €1, 
€2 be independent with EA = p, var(A) = a2, Ee; = Ee = 0, var(é1) = of, 
var(é2) = 6e; so that X and Y have common mean yz and variances or + a2 and 
af + o2, respectively. Also, the covariance between X and Y is ae. The data are as 
follows: 


Test Unit 


1 2 3 4 5 6 7 8 9 10 


First device, X 71 108 72 140 61 97 90 127 101-114 
Second device, ¥ 77 105 71 152 88 117. 93— «130 112-105 
Z=X-Y —6 3 1 -8 -17) -20 -3 -3 -Ili1 9 


Let us test the hypothesis Ho: med(Z) = 0. The number of Z;’s > 0 is 3. We 
have 


3 10 1 10 
P{number of Z;’s > Ois <3| Ho} = >> (7) (;) 
k=0 2 


= 0.172. 


Using the two-sided sign test, we cannot reject Ho at level a = 0.05, since 0.172 > 
0.025. The RVs Z; can be considered to be distributed normally, so that under Ho 
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the common mean of Z;’s is 0. Using a paired comparison t-test on the data, we can 
show that ¢ = —0.88 for 9 d.f., so we cannot reject the hypothesis of equality of 
means of X and Y at level a = 0.05. 


Finally, we consider the Wilcoxon signed-ranks test. 


Wilcoxon Signed-Ranks Test 

The sign test for median and symmetry loses information since it ignores the mag- 
nitude of the difference between the observations and the hypothesized median. The 
Wilcoxon signed-ranks test provides an alternative test of location (and symmetry) 
that also takes into account the magnitudes of these differences. 

Let X1,-X2,... , X, be iid RVs with common absolutely continuous DF F’, which 
is symmetric about the median 31/2. The problem is to test Ho: 31/2 = 30 against 
the usual one- or two-sided alternatives. Without loss of generality, we assume that 
30 = 0. Then F(—x) = 1 — F(x) for all x € R. To test Ho: F(0) = 5 or 31/2 = 9, 
we first arrange |X|, |X2|,...,|Xn| in increasing order of magnitude and assign 
ranks 1,2,... ,n, keeping track of the original signs of X;. For example, ifn = 4 
and |X2| < |X4| < |X | < |X3], the rank of |Xj| is 3, of |X2| is 1, of |X3| is 4, and 
of |X4| is 2. 

Let 


(20) | T* = sum of the ranks of positive X;’s, 


T~ = sum of the ranks of negative X;’s. 
Then, under Ho, we expect T+ and T~ to be the same. Note that 
“.. nn+1) 


21 TT4T" = 
(21) + 2 5 


so that 7* and T~ are linearly related and offer equivalent criteria. Let us define 


(22) 


$y 2c aos 


1 if X¥; >0 
Zi = ‘ ’ 
0 if X; <0 


and write R(|X;|) = Rj' for the rank of |X;|. Then T+ = 7", R/Z; and T~ = 
Dja1(1 ~ Z;)R;*. Also, 


n n 
(23) Tt-T =-) RF +2) > ZRF 
i=] i=l 


n 
_ + n(n +1) 
29 Ds R= 
b=) 
The statistic 7+ (or T~) is known as the Wilcoxon statistic. A large value of T* (or, 
equivalently, a small value of T~) means that most of the large deviations from 0 are 
positive, and therefore we reject Ho in favor of the alternative, Hj : 31/2 > 0. 
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A similar analysis applies to the other two alternatives. We record the results as 
follows: 


Ao Hy, Reject Ao if: 
312 =0 ny >0 T+ > 
312 = 0 312 <9 Tt < c 
312 = 9 31/2 #0 T* < ¢€3 OF Tt > 


We now show how the Wilcoxon signed-ranks test statistic is related to the U- 
Statistic estimate of g2(F) = Pr(X; + X2 > 0). Recall from Example 13.2.6 that 
the corresponding U-statistic is 


~1 
n 
(24) U2(X) = (°) Yo Axitxj>0- 
l<i<j<n 
First note that 
(25) YS ltxj0 = 2 tao + yo Itxi+x;>01- 
1<i<j<n 1<i<j<n 


Next note that fori < j, X@ + Xj) > 0 if and only if X(j) > Oand |X| < |Xcj)]. 
It follows that 7/_, I X@+Xjy>0] iS the signed rank of X,j). Consequently, 


nd 
(26) C= s ~ Lx (+X y>0) = > I[x;+Xj>0) 
j=l i=l 1<i<j<n 
= 2 tas + Do Axi+x,;>0 
l<i<j<n 


= nU;(X) + (; )u20% 


where U, is the U-statistic for g)(F) = Pr(X, > 0). 

We next compute the distribution of 7+ for small samples. The distribution of T+ 
is tabulated by Kraft and Van Eeden [53, pp. 221-223]. 

Let 


ln = 1 if the |X ;| that has rank i is > 0, 
a 0 otherwise. 


Note that T+ = 0 if all differences have negative signs, and 7+ = n(n + 1)/2 if 
all differences have positive signs. Here a difference means a difference between the 
observations and the postulated value of the median. T+ is completely determined by 
the indicators Z,;), so that the sample space can be considered as a set of 2” n-tuples 
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(Z1, Z2,+-+ » Zn), Where each z; is 0 or 1. Under Ao, 31/2 = 30 and each arrangement 
is equally likely. Thus 


{number of ways to assign + or — signs to 


x integers 1, 2,... , so that the sum is ¢} 
(27) a a 
n(t) 
= on? Say. 


Note that every assignment has a conjugate assignment with plus and minus signs 
interchanged so that for this conjugate, T* is given by 


n 


1 n 
(28) lid — Zw) = ene ~\ iz. 
1 1 


Thus under Ap the distribution of T* is symmetric about the mean n(n + 1)/4. 


Example 7. Let us compute the null distribution forn = 3. Eyx,T* = n(n + 
1)/4 = 3, and T* takes values from 0 to n(n + 1)/2 = 6: 


Ranks Associated with 
Value of T+ Positive Differences n(t) 
6 1,2,3 I 
5 2,3 1 
4 1,3 1 
3 1,2;3 2 
so that 
4, t=4,5,6,0,1,2, 
(29) PulT =th= 4%, 1 =3, 
0, otherwise. 


i  t=0,1,2,8,9, 10, 
(30) Pui{T? =th={2, 1=3,4,5,6,7, 
0, otherwise. 


An alternative procedure would be to use the MGF technique. Under Ho, the RVs 
i Zi) are independent and have the PMF 


P{iZ@ = i} = P{iZ@ = 0} = 5. 
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Thus 


(31) M(t) = Ee™* 
n eit 4] 
ae 


i=] 


We express M(t) as a sum of terms of the form a ;e/"/2". The PMF of T* can then 
be determined by inspection. For example, in the case n = 4, we have 


Ps (ee iia A 
a aa 2 y) 
1 3r 1 At 1 
(32) = Get te pn 
1 ae oa 
(33) = gen tet +e + 20% tet+el 41° . 


1 
34) = zee tem teh + 20" + 2% 4 265 4 2e4 4 203! 4 of 4 of 4 1). 
This method gives us the PMF of T* forn = 2,n = 3, andn = 4 immediately. 
Quite simply, 


(35) Pxo{T* = j} = coefficient of e/* in the expansion of M(t), j = 0, 
1,... ,n(n+.1)/2. 


See Problem 3.3.12 for the PGF of TT. 


Example 8. Let us return to the data of Example 5 and test Ho: 31/2 = u = —1.0 
against Hy: 31/2 > —1.0. Ranking |x; — 31/2| in increasing order of magnitude, we 
have 


0.016 < 0.131 < 0.535 < 0.762 < 1.056 < 1.120 < 1.417 < 1.561 


5 4 1 3 7 2 6 8 
Thus 
ry, = 3, ro = 6, rz =4, r4 = 2, 
r5= 1, r6=7, r7=5, rg =8 
and 


Tt =346444247454+8 =35. 


From Table ST10, Ho is rejected at level a = 0.05 if Tt > 31. Since 35 > 31, we 
reject Ho. 


SOME SINGLE-SAMPLE PROBLEMS 621 


Remark 1. The Wilcoxon test statistic can also be used to test for symmetry. Let 
X, X2,... , X» be iid observations on an RV with absolutely continuous DF F. We 
set the null hypothesis as 


Ho: 31/2 = 30, and DF F is symmetric about go. 
The alternative is 
Ay: 31/2 % 30 and F symmetric, or F asymmetric. 
The test is the same since the null distribution of T* is the same. 


Remark 2. If we have n independent pairs of observations (X;, Y1), (X2, Y2), 
,..» (Xn, Yn) from a bivariate DF, we form the differences Z; = X; — Y;,i = 
1,2,...,m. Assuming that Z,, Z2,..., Z, are (independent) observations from a 
population of differences with absolutely continuous DF F that is symmetric with 
median 31/2, we can use the Wilcoxon statistic to test Ho: 31/2 = 30- 


We present some examples. 
Example 9. For the data of Example 10.3.3, let us apply the Wilcoxon statistic to 
test Ho: 31/2 = 0 and F is symmetric against H) : 31/2 4 0 and F symmetric or F 


not symmetric. 


The absolute values, when arranged in increasing order of magnitude, are as fol- 
lows: 


0.057 < 0.068 < 0.137 < 0.261 < 0.323 < 0.464 < 0.482 < 0.486 


13 5 2 17 4 i 11 15 
< 0.508 < 0.513 < 0.525 < 0.595 < 0.881 < 0.906 < 1.046 
20 7 8 9 10 6 19 
< 1.229 < 1.237 < 1.678 < 1.787 < 2.455 
14 18 12 16 3 
Thus 
r, = 6, r2 = 3, r3=20, m4=5, r6=2, roe = 14, 
r7= 10, rg=11, ro =12, rig=13, ryp=7, rio = 18, 
ri3=1, riga=16, ris=8, rye =19, ri7=4, rig =17, 
ro =15, roo =9, 
and 


T* =64+34204+ 144124134 18417415 =118. 


From Table ST10 we see that Hp cannot be rejected even at level a = 0.20. 
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Example 10. Returning to the data of Example 6, we apply the Wilcoxon test to 
the differences Z; = X; — Y;. The differences are —6, 3, 1, —8, —17, —20, —3, —3, 
—11, 9. To test Ho: 31/2 = 0 against 1: 31/2 # 0, we rank the absolute values of 
z in increasing order to get 


1<3=3=3<6<8 <9<11< 17 < 20 
and 
T* =14+24+7= 10. 


Here we have assigned ranks 2, 3, 4 to observations +3, —3, —3. (If we assign rank 
4 to observation 3, then T+ = 12 without appreciably changing the result.) 

From Table ST10 we reject Ho at a = 0.05 if either T+ > 46 or T+ < 9. Since 
T* > 9 and < 46, we accept Ho. Note that hypothesis Ho was also accepted by the 
sign test. 


For large samples we use the normal approximation. In fact, from (26) we see that 


Me EE eo 80) 4 Wal BO). 


n n 
2 2 
Clearly Uj — EU, —> O and since n3/2 / i“ — 0, the first term —> 0 in probability 


as n — oo. By Slutsky’s theorem (Theorem 6.2.15) it follows that 
Jn 

n 

2 
have the same limiting distribution. From Theorem 13.2.3 and Example 13.2.7 it 


follows that ./n(U2—EU?2), and hence (T+-ET*) yn / Cy: has a limiting normal 


distribution with mean 0 and variance 


(T+ —ETt) and Jn(U2 — EU2) 


4¢; = 4Pr(X1 + Xz > 0, X1 + X3 > 0) — 4P2(X1 + X2 > 0). 


Under Ho, the RVs iZ,;) are independent b(1, 5) so 


4 nn+)) rae ehw es “7 n(nt+ IQn+1) 
Ent Seg and) vary, T* = aI\5 2 =e 


Also, under Ho, F is continuous and symmetric, so 


Pr(X1 + X2 > 0) -|/ Pr(X1 > —x) f(x) dx =}, 


—OO 
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and 


foe] 
Pr(X1 + X2 > 0, X; + X3 > 0) =| [Pr(X1 > —x) P fx) dx = i, 
—0o 


However, 
(vary TH)? _ [n(n + Qn+4)/2417 
n 1 _ n(n—1) j 1 
2)V 3n 2 3n 
as n -» oo. Consequently, under Ho, 


n(n+1) n(n+1)(Qn+ >) 
4 , 24 , 


T+ ~ aN( 


Thus, for large enough n we can determine the critical values for a test based on T+ 
by using normal approximation. 

As an example, take n = 20. From Table ST10 the P-value associated with tt= 
140 is 0.10. Using normal approximation yields 


140 — 105 


Tt > 140) P(Z 
He Pet) ( ~ "97.45 


) = P(Z > 1.28) = 0.10003 


PROBLEMS 13.3 


1. Prove Theorem 4. 


2. A random sample of size 16 from a continuous DF on [0, 1] yields the following 
. data: 0.59, 0.72, 0.47, 0.43, 0.31, 0.56, 0.22, 0.90, 0.96, 0.78, 0.66, 0.18, 0.73, 
0.43, 0.58, 0.11. Test the hypothesis that the sample comes from U[0, 1]. 


3. Test the goodness of fit of normality for the data of Problem 10.3.6 using the 
Kolmogorov—Smirnov test. 


4. For the data of Problem 10.3.6, find a 0.95 level confidence band for the distri- 
bution function. 


5. The following data represent a sample of size 20 from U[0, 1]: 0.277, 0.435, 
0.130, 0.143, 0.853, 0.889, 0.294, 0.697, 0.940, 0.648, 0.324, 0.482, 0.540, 
0.152, 0.477, 0.667, 0.741, 0.882, 0.885, 0.740. Construct a 0.90 level confi- 
dence band for F(x). 
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6. In Problem 5, test the hypothesis that the distribution is U[0, 1]. Take a = 0.05. 


7. For the data of Example 2, test, by means of the sign test, the null hypothesis 
Ho: w = 1.5 against Hy): w 41.5. 


8. For the data of Problem 5, test the hypothesis that the quantile of order p == 0.20 
is 0.20. 


9, For the data of Problem 10.4.8, use the sign test to test the hypothesis of no 
difference between the two averages. 


10. Use the sign test for the data of Problem 10.4.9 to test the hypothesis of no 
difference in grade-point averages. 


11. For the data of Problem 5, apply the signed-rank test to test Ho: 31/2 = 9.5 
against H; : 31/2 # 0.5. 


12. For the data of Problems 10.4.8 and 10.4.9, apply the signed-rank test to the 
differences to test Ho: 31/2 = O against Ay: 31/2 4 0. 


13.4 SOME TWO-SAMPLE PROBLEMS 


In this section we consider some two-sample tests. Let X1, X2,...,Xm and 
Y,, Y2,..., Y, be independent samples from two absolutely continuous distribu- 
tion functions Fx and Fy, respectively. The problem is to test the null hypothesis 
Ho: Fx (x) = Fy (x) for all x € R against the usual one- and two-sided alternatives. 
Tests of Ho depend on the type of alternative specified. We state some of the 
alternatives of interest even though we do not consider all of these in this book. 


I Location alternative: Fy (x) = Fx(x — 0), 6 #0. 
Il Scale alternative: Fy (x) = Fy(x/a), o > 0. 
Ill Lehmann alternative: Fy (x) = 1 — [1 — Fx(x)]?+!, @41>0. 
IV Stochastic alternative: Fy(x) > Fy (x) for all x, and Fy(x) > Fy(x) for at 
least one x. 
V General alternative: Fy (x) # Fy (x) for some x. 


Some comments are in order. Clearly, I through IV are special cases of V. Alter- 
natives I and II show differences in Fx and Fy in location and scale, respectively. 
Alternative III states that P(Y > x) = [P(X > x)|°t!. In the special case when @ is 
an integer, it states that Y has the same distribution as the smallest of the 6 + 1 of X- 
variables. A similar alternative to test that is sometimes used is Fy(x) = [Fx (x)]® 
for some a > O and all x. When a is an integer, this states that Y is distributed as the 
largest of the a X-variables. Alternative IV refers to the relative magnitudes of X’s 
and Y’s. It states that 


P(Y <x) > P(X <x) for all x, 
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so that 
(1) P(Y > x) < P(X > x), 
for all x. In other words, X’s tend to be larger than the Y’s. 


Definition 1. We say that a continuous RV X is stochastically larger than a con- 
tinuous RV Y if inequality (1) is satisfied for all x with strict inequality for some x. 


A similar interpretation may be given to the one-sided alternative Fy > Fy. Inthe 
special case where both X and Y are normal RVs with means ,21, 42 and common 
variance 0”, Fy = Fy corresponds to w1 = jz and Fy > Fy corresponds to 
1 < p22. 

In this section we consider some common two-sample tests for location (case 
I) and stochastic ordering (case IV) alternatives. First, note that a test of stochastic 
ordering may also be used as a test of less restrictive location alternatives since, 
for example, Fx > Fy corresponds to larger Y’s and hence larger location for Y. 
Second, we note that the chi-square test of homogeneity described in Section 10.3 
can be used to test general alternatives (case V) Hy: F(x) # G(x) for some x. 
Briefly, one partitions the real line into Borel sets A1, Az,... , Az. Let 


Pit = P(X; € Aj) and pi2 = P(Y; € Ai), 


i = 1,2,...,k. Under Ho: F = G, pi, = pi2,i = 1,2,...,k, which is the 
problem of testing equality of two independent multinomial distributions discussed 
in Section 10.3. 

We first consider a simple test of location. This test, based on the sample median 
of the combined sample, is a test of the equality of medians of the two DFs. It will 
tend to accept Ho: F = G even if the shapes of F and G are different as long as 
their medians are equal. 


13.4.1 Median Test 


The combined sample X1, X2,..., Xm, Yi, Y2,..- , Yn is ordered and a sample me- 
dian is found. If m+n is odd, the median is the [(m-+n+ 1)/2]th value in the ordered 
arrangement. Jf m + n is even, the median is any number between the two middle 
values. Let V be the number of observed values of X that are < the sample median 
for the combined sample. If V is large, it is reasonable to conclude that the actual 
median of X is smaller than the median of Y. One therefore rejects Hp: F = G 
in favor of H,: F(x) > G(x) for all x and F(x) > G(x) for some x if V is too 
large, that is, if V > c. If, however, the alternative is F(x) < G(x) for all x and 
F(x) < G(x) for some x, the median test rejects Hp if V < c. For the two-sided 
alternative that F(x) 4 G(x) for some x, we use the two-sided test. 

We next compute the null distribution of the RV V. If m +n = 2p, p a positive 
integer, then 
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(2) Py {V = v} = Py, {exactly v of the X;’s are < combined median} 


v=0,1,2,...,m, 


0, otherwise. 


Here 0 < V < min(m, p). If m+n =2p +1, p > 0, is an integer, the [(m +n + 
1) /2}th value is the median in the combined sample, and 


(3) Pu{V = v} = P{exactly v of the X;’s are below the (p + 1)th value 
in the ordered arrangement} 


v=0,1,...,min(m, p), 


0, otherwise. 


Remark 1, Under Hp we expect (m + n)/2 observations above the median and 
(m + n)/2 below the median. One can therefore apply the chi-square test with 1 df. 
to test Hp against the two-sided alternative. 


Example 1. The following data represent lifetimes (hours) of batteries for two 
different brands: 


Brand A: 40 30 40 45 55 30 
Brand B: 50 S50 45 55 60 40 


The combined ordered sample is 30, 30, 40, 40, 40, 45, 45, 50, 50, 55, 55, 60. 
Since m +n = 12 is even, the median is 45. Thus 


v = number of observed values of X that are less than or equal to 45 
= 5, 
Now 
(JQ)  (O) 
S/\l 6/ \0 
Py lV =5}= TD + 7. ~ 0.04. 
6 6 

Since Py,{V > 5} > 0.025, we cannot reject Ho, that the two samples come from 
the same population. 


SOME TWO-SAMPLE PROBLEMS 627 


We now consider two tests of the stochastic alternatives. As mentioned earlier, 
they may also be used as tests of location. 


13.4.2 Kolmogorov-Smirnov Test 


Let X,, X2,..., Xm and Yj, Yo,... , Y, be independent random samples from con- 
tinuous DFs F and G, respectively. Let F%, and G7, respectively, be the empirical 
DFs of the X’s and Y’s. Recall that F* is the U-statistic for F, and Gj, that for G. 
Under Ho: F(x) = G(x) for all x, we expect a reasonable agreement between the 
two sample DFs. We define 


(4) Dinn = sup | F(x) — Gi (x). 


Then Dm,n may be used to test Hp against the two-sided alternative Hy): F(x) # 
G(x) for some x. The test rejects Hp at level a if 


(5) Dinn 2 Dine 


where Py,{Dm.n = Dm nja} < a. 
Similarly, one can define the one-sided statistics 


(6) Dy, = supl Fx (x) — Gi(x)] 
and 
(7) Dyn = SuplGt(x) — FA(x)], 


to be used against the one-sided alternatives 


(8) G(x) < F(x) forallx and G(x) < F(x) forsome x 
with rejection region Dt, > Di ne 

and 

(9) F(x) < G(x) forallx and F(x) < G(x) forsome x 
with rejection region Dy, , = Dino 

respectively. 


For small samples, tables due to Massey [70] are available. In Table ST9 we give 
the values of Dm n,q and Dz ,, , for some selected values of m,n, and a. Table ST8 
gives the corresponding values for the m = n case. 

For large samples we use the limiting result due to Smirnov [105]. Let N = 


mn/(m +n). Then 
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1- en 2a? A> 0, 


. + aes 
(10) tim | PIV Dy SA} = {, A <0, 


and 


eal ; (242 
Ys eple™, > 0, 
<A}= 


= jr-oo 


0, A <0. 


(11) lim P{VN Dm,n 
m,n—>0O 


Relations (10) and (11) give the distribution of D+ in and Dm_n, respectively, under 
Ho: F(x) = G(x) for all x € R. 


Example 2. Let us apply the test to data from Example 10. Do the two brands 
differ with respect to average life? 

Let us first apply the Kolmogorov—Smirnov test to test Ho that the population 
distribution of length of life for the two brands is the same. 


x FE (x) Gi(x) | F(x) — G5(x)| 

2 2 
_ =. & 
2 5 é é 
SF 
50 5 é 6 
55 1 2 "3 
60 1 1 0 

Doo = sup |F¢ (x) — Gé(x)| = >. 


From Table ST8 the critical value for m = n = 6 at level a = 0.05 is D6,6,0.05 = 
. Since De 4 De,6,0.05, we accept Ho that the population distribution for the 
length of life for the two brands is the same. 

Let us next apply the two-sample t-test. We have X = 40, y = 50, se = 90, 
s} = 50, s? = 70. Thus 


oe 8 ie: 
J70,/3 + 4 


Since t10,0.025 = 2.2281, we accept the hypothesis that the two samples come from 
the same (normal) population. 
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The second test of stochastic ordering alternatives we consider is the Mann— 
Whitney—Wilcoxon test, which can be viewed as a test based on a U-statistic. 


13.4.3 Mann—Whitney—Wilcoxon Test 


Let X;, X2,..., Xm and Y;, Y2,... , Y, be independent samples from two continu- 
ous DFs, F' and G, respectively. As in Example 13.2.9, let 


1 if Xi < Yj, 


POEL apy, Sty, 


fori = 1,2,...,m, j = 1,2,...,n. Recall that T(X;; ¥;) is an unbiased esti- 
mator of g(F,G) = Pr.g(X < Y) and the two-sample U-statistic for g is given 
by Ui (X; Y) = (m,n)7! OL, Vi=1 T(X;; Y;). For notational convenience, let us 
write 


m n 
(12) U = mnU\(X; ¥) = )) D> TCX: ¥)). 

i=i j=l 
Then U is the number of values of X;, X2,... , Xm that are smaller than each of 
Y, Yo,... , Yn. The statistic U is called the Mann—Whitney statistic. An alternative 


equivalent form using Wilcoxon scores is the linear rank statistic given by 
n 

(13) W=)° 0), 
mm 


where Q; = rank of Y; among the combined m + n observations. Indeed, 
Q; = rank of Y; = (no. of X;’s < Y;)+ rank of Y; in Y’s. 
Thus 


n(n + 1) 


(14) W=)O)=Ut+) j=Ut ; 


n 
j=l j=! 


so that U and W are equivalent test statistics—hence the name Mann—Whitney- 
Wilcoxon test. We restrict our attention to U as the test statistic. 


Example 3. Let m = 4, n = 3, and suppose that the combined sample when 
ordered is as follows: 


X2<X1 <3 < yo <x4 << yy] < X3. 


Then U = 7, since there are three values of x < y,, two values of x < y2, and two 
values of x < y3. Also, W = 13, so U = 13 — 3(4)/2 =7. 


Note that U = 0 if all the X;’s are larger than all the Y;’s and U = mn if all the 
X;’s are smaller than all the Y;’s, because then there are m X’s < Yj, m X’s < Yp, 
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and so on. Thus 0 < U < mn. If U is large, the values of Y tend to be larger than 
the values of X (Y is stochastically larger than X), and this supports the alternative 
F(x) => G(x) for all x and F(x) > G(x) for some x. Similarly, if U is small, 
the Y values tend to be smaller than the X values, and this supports the alternative 
F(x) < G(x) for all x and F(x) < G(x) for some x. We summarize these results as 
follows: 


Ay A, Reject Ho if: 
F=G F>G U>c 
F=G F<G U<% 
F=G FH#G U>c3oU <e% 


To compute the critical values we need the null distribution of U. Let 
(15) Pmn(u) = Py{U = u}. 


We will set up a difference equation relating pmjn tO Pm—i,n and Pm »—1. If the 
observations are arranged in increasing order of magnitude, the largest value can be 
either an x value or a y value. Under Ap, all m + n values are equally likely, so the 
probability that the largest value will be an x value is m/(m + n) and that it will be 
ay value is n/(m +n). 

Now, if the largest value is an x, it does not contribute to U, and the remaining 
m — | values of x and n values of y can be arranged to give the observed value 
U = u with probability p,,-1,,(u). If the largest value is a Y, this value is larger 
than all the m x’s. Thus, to get U = u, the remaining n — 1 values of Y and m values 
of x contribute U = u — m. It follows that 


m n 
(16) Pmn(t) = me 7, Pm—1.n(u) + im hn Peni —m). 
If m = 0, then forn > 1, 
(17) Give 1 ifu =0, 
POR No: “atu S 0. 
Ifn = 0, m > 1, then 
(18) em 1 ifu = 0, 
ules dae Yaar oer} 
and 
(19) Pm.n(u) = 0 ifu<0, m>0, n>0O. 


For small values of m and n, one can easily compute the null PMF of U. Thus, if 
m =n = 1, then 
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pis) =4 and pil) = 5. 

If m = 1,n = 2, then 
p1,2) = p12(1) = pi2(2) = 5. 


Tables for critical values are available for small values of m and n,m < n (see, 
e.g., Auble [2] or Mann and Whitney [69]). Table ST11 gives the values of ua for 
which Py){U > uq} < a for selected values of m,n, and a. 

If m, n are large, we can use the asymptotic normality of U. In Example 13.2.10 
we showed that under Ho, 


U/(mn) — 5 
Vim +n+t)/12mn 


as m,n —> oo such that m/(m +n) — constant. The approximation is fairly good 
for m,n > 8. 


+, NO, 1) 


Example 4. Two samples are as follows: 


Values of X;: 1, 2,3, 5, 7,9, 11, 18 
Values of ¥;: 4,6, 8, 10, 12, 13, 14, 15, 19 
Thus m = 8,n = 9,and UU = 3+4+4+5+4+64+74+74+7+4+7+8 =54. The (exact) 


P-value is Px,(U > 54) = 0.046, so we reject Ho at (two-tailed) level a = 0.1. Let 
us apply the normal approximation. We have 


8.9 8-9 
En = > = 36, vary, (U) = qo S+9+ 1) a 108, 
and 
54 — 36 18 
Z= = —- = V3 = 1.732 
V108 = 66/3 


We note that P(Z > 1.73) = 0.042. 


PROBLEMS 13.4 


1. For the data of Example 4, apply the median test. 


2. Twelve 4-year-old boys and twelve 4-year-old girls were observed during two 15- 
minute play sessions, and each child’s play during these two periods was scored 
as follows for incidence and degree of aggression: 
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Boys: 86, 69, 72, 65, 113, 65, 118, 45, 141, 104, 41, 50 
Girls: 55, 40, 22, 58, 16, 7, 9, 16, 26, 36, 20, 15 


Test the hypothesis that there were gender differences in the amount of aggres- 
sion shown, using (a) the median test, and (b) the Mann—Whitney—Wilcoxon test. 
(Siegel [103]) 


3. To compare the variability of two brands of tires, the following mileages (1000 
miles) were obtained for eight tires of each kind: 


Brand A: 32.1, 2.6, 17.8, 28.4, 19.6, 21.4, 19.9, 3.1 
Brand B: 19.8, 27.6, 3.8, 27.6, 34.1, 18.7, 16.9, 17.9 


Test the nul! hypothesis that the two samples come from the same population, 
using the Mann—Whitney—Wilcoxon test. 


4. Use the data of Problem 2 to apply the Kolmogorov—Smirnov test. 
5. Apply the Kolmogorov-Smirnov test to the data of Problem 3. 


6. Yet another test for testing Ho: F = G against general alternatives is the runs 
test. A run is a succession of one or more identical symbols which are preceded 
and followed by a different symbol (or no symbol). The /ength of a run is the 
number of like symbols in a run. The total number of runs, R, in the combined 
sample of X’s and Y’s when arranged in increasing order can be used as a test of 
Ho. Under Ho the X and Y symbols are expected to be well mixed. A small value 
of R supports Hi: F # G. A test based on R is appropriate only for two-sided 
(general) alternatives. Tables of critical values are available. For large samples, 
one uses normal approximation: 


R~AN(1+ 2mn = 2mn(2mn — m —n) ). 


m+n’ (m+n — 1)(m +n)2 


(a) Let R; = number of X-runs, R2 = number of Y-runs, and R = Rj + Ro. 
Under Ho, show that 


m—1 ae 
P(Ri =", R grey Ne 
1="1,42=r2)= m+n ’ 
m 
where k = 2 ifr) = ro, = Lif |ry — rol = lero = 1,2,...,mandro = 
1,2,...,7. 


(b) Show that 


~1 1 
Puo(Ri =r) = (" 1 Ga yeaa O<ry<m. 
ry~-l ry m 
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7. Fifteen 3-year-old boys and fifteen 3-year-old girls were observed during two 
sessions of recess in a nursery school. Each child’s play was scored for incidence 
and degree of aggression as follows: 


Boys: 96, 65, 74, 78, 82, 121, 68, 79, 111, 48, 53, 92, 81, 31, 40 
Girls: 12, 47, 32, 59, 83, 144, 32, 15, 17, 82, 21, 34, 9, 15, 51 


Is there evidence to suggest that there are gender differences in the incidence and 
amount of aggression? Use both Mann—Whitney—Wilcoxon and runs tests. 


13.5 TESTS OF INDEPENDENCE 


Let X and Y be two RVs with joint DF F(x, y), and let F; and Fo, respectively, be 
the marginal DFs of X and Y. In this section we study some tests of the hypothesis 
of independence, namely, 


Ho: F(x, y) = Fix) Fay) _—forall @, y) € Ro 
against the alternative 
WM: F(x, y)# Fi) Fo(y) _ for some (x, y). 


If the joint distribution function F is bivariate normal, we know that X and Y are 
independent if and only if the correlation coefficient p = 0. In this case, the test of 
independence is to test Hp: p = 0. 

In the nonparametric situation the most commonly used test of independence is 
the chi-square test, which we now study. 


13.5.1. Chi-Square Test of Independence (Contingency Tables) 


Let X and Y be two RVs, and suppose that we have n observations on (X, Y). Let 
us divide the space of values assumed by X (the real line) into r mutually exclusive 
intervals A;, A2,... , A,. Similarly, the space of values of Y is divided into c disjoint 
intervals B,, Bz2,..., Be. As a rule of thumb, we choose the length of each interval 
in such a way that the probability that X(Y) lies in an interval is approximately 
(1/r)(1/c). Moreover, it is desirable to have n/r and n/c at least equal to 5. Let Xj; 
denote the number of pairs (Xx, Y¥<), k = 1,2,... ,n, that lie in A; x B;, and let 


(1) Pij = P{(X, Y) € A; x Bj} = P{X € A; and Y € Bj}, 
where i = 1,2,...,r, fj =1,2,...,c. If each pij is known, the quantity 
ta 


(2) 2) an (Kaj — nig)” npij)” 


r= he NDij 
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has approximately a chi-square distribution with rc — 1 d.f., provided that n is large 
(see Theorem 10.3.2). If X and Y are independent, P{(X, Y) ¢ Aj x Bj} = P{X € 
Ai}P{¥ € B;}. Let us write pj. = P{X € Aj} and p.; = P{Y € Bj}. Then under 
Ho: pij = pi-p-j,i = 1,2,...,7, fj = 1,2,...,c¢. In practice, pj; will not be 
known. We replace p;; by their estimates. Under Ho, we estimate p;. by 


Cc 
(3) y= Sete i=1,2,...,7, 
n 
and p.; by 
r 
. xX.. ; 
(4) p= > a jxl,2,...,¢. 


=1 


Since }°5_) P.j = 1 = Y) Bi. we have estimated onlyr —1+c—1l=r+c—2 
parameters. It follows (see Theorem 1.3.4) that the RV 


(5) U= pp ea eas (Xij — Pi Pj)” nDi. p.j)" 


i=i j=1 nPiP.j 


is asymptotically distributed as x? with re — 1 — (r +c — 2) = (r — Ie — 1) 
d. is under Ho. The null hypothesis is rejected if the computed value of U exceeds 
XG—1N(e-1),0° 

It is frequently convenient to list the observed and expected frequencies of the rc 
events A; x B; inanr x c table, called a contingency table, as follows: 


Observed Frequency Oj; Expected Frequency E;; 
B, B,--- B, B, B,---B, 
A; Xu Xj2°-- Xtc VL xXy NPi.pj  MPy.pr---NPi.Pc Mpi. 
Ay Xn Xx > Xr Y Xa NPr.P.  NP2.P2-**NP2rPc MP2 
A, Xp X,2°°+ Xr¢ Xj NPy.P.)  MPp.P2-**NPr- Pc Mr. 
Xia > Xi2 Ye Xic n np. np. NP. n 


Note that the X;;’s in the table are frequencies. Once the category A; x B; is 
determined for an observation (X, Y), numerical values of X and Y are irrelevant. 
Next, we need to compute the expected frequency table. This is done quite simply by 
multiplying the row and column totals for each pair (i, j) and dividing the product 
by n. Then we compute the quantity 


(Ei; — ij)" 
me Bij 
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and compare it with the tabulated x7 value. In this form the test can be applied even 
to qualitative data. Aj, A2,... , Ar and B), Bp, ... , Bc represent the two attributes, 
and the null hypothesis to be tested is that the attributes A and B are independent. 


Example I. Following are the results for a random sample of 400 employees: 


Annual Income (dollars) 


Time (years) with Less Than More Than 

the Same Company 40,000 40,000-75,000 75,000 Total 

<5 50 75 25 150 

5-10 25 50 25 100 

10 or more 25 75 50 150 
Total 100 200 100 400 


If X denotes the length of service with the same company, and Y, the annual 
income, we wish to test the hypothesis that X and Y are independent. The expected 
frequencies are as follows: 


Expected Frequency for Income of: 


Time (years) with 


the Same Company < 40,000 40,000—75,000 >= 75,000 Total 
<5 37.5 75 37.5 150 
5-10 25 50 25 100 
> 10 37.5 75 37.5 150 
Total 100 200 100 400 
Thus 
(12.5)2 0 -- -(12..5)? (12.5)? (12.5)2 
= — f ——— — 404 0 +40 0 
37.5 a5 agg PUTT OT as 37.5 
= 16.66. 


The number of degrees of freedom is (3 — 1)(3 — 1) = 4, and Xi0.05 = 9.488. Since 
16.66 > 9.488, we reject Ho at level 0.05 and conclude that length of service with a 
company is not independent of annual income. 


13.5.2 Kendall’s Tau 
Let (X1, Y1), (X2, Yo), ... , (Xn, Yn) be a sample from a bivariate population. 


Definition 1. For any two pairs (X;, ¥;) and (X;, ¥;) we say that the relation is 
perfect concordance (or agreement) if 


(6) X; < Xj; whenever ¥; < Y; or X; > Xj whenever Y; > Y; 
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and that the relation is perfect discordance (disagreement) if 
(7) X; > X; whenever Y; < Y; or X; < Xj; whenever Y; > Y;. 


Writing 2, and mq for the probability of perfect concordance and of perfect dis- 
cordance, respectively, we have 


(8) He = P{(X; — Xi)(¥j — Yi) > O} 
and 
(9) mq = P{(X; — X;i)(Yj — Yi) < 9}, 


and if the marginal distributions of X and Y are continuous, 
(10) He =(P{¥; < Y;} — P{X; > Xj and Y; < Yj} 
+([P{Y; > Yj} — P{Xi < Xj; and Y; > YjJ])=1— 7g. 
Definition 2. The measure of association between the RVs X and Y defined by 
(11) tT =e —Nd 
is known as Kendall’s tau. 


If the marginal distributions of X and Y are continuous, we may rewrite (11), in 
view of (10), as follows: 


(12) t=1-—27g = 2m, —1. 
In particular, if X and Y are independent and continuous RVs, then 
P{X; < Xj} = P{X; > Xj} =}, 
since then X; — X; is a symmetric RV. Then 
We = P{X; < Xj}P{Y; < Yj} + P(Xi > Xj}P(Vi > Yj} 
= P{X; > Xj}P{¥i < Yj} + P(Xi < XjJP(%i > Yj =74, 


and it follows that t = 0 for independent continuous RVs. 

Note that, in general, r = 0 does not imply independence. However, for the bi- 
variate normal distribution, t = 0 if and only if the correlation coefficient p between 
X and Y is 0, so that t = O if and only if X and Y are independent (Problem 6). 

Let 


1, (y2 — y1) (x2 — x1) > 0, 
0, otherwise. 


(13) wv (1, y1), 2, y2)) = | 
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Then Ew ((X1, Yi), (X2, Y2)) = te = (1 + 1)/2, and we see that t, is estimable of 
degree 2, with symmetric kernel y defined in (13). The corresponding one-sample 
U-statistic is given by 


-1 
(14) U (1, Yrs «ns Yad) = (2) Yo ((%, Yi), (Xj. ¥p)). 


I<i<j<n 
Then the corresponding estimator of Kendall’s tau is 
(15) T=2U-1 


and is called Kendall’s sample correlation coefficient. 

Note that —1 < T < 1. To test Ho that X and Y are independent against H;: X 
and Y are dependent, we reject Ho if |T| is large. Under Hp, t = 0, so that the null 
distribution of T is symmetric about 0. Thus we reject Ho at level a if the observed 
value of T,, t, satisfies |f] > ta/2, where P{|T| > ta/2 | Ho} =a. 

For small values of 1 the null distribution can be evaluated directly. Values for 
4 <n < 10 are tabulated by Kendall [49]. Table ST12 gives the values of Sy for 


which P{S > Sa} < @, where S = @ for selected values of n and a. 


For a direct evaluation of the null distribution we note that the numerical value 
of T is clearly invariant under all order-preserving transformations. It is therefore 
convenient to order X and Y values and assign them ranks. If we write the pairs from 
the smallest to the largest according to, say, X values, the number of pairs of values 
of 1 <i < j <n for which Y; — Y; > 0 is the number of concordant pairs, P. 


Example 2. Let n = 4, and let us find the null distribution of T. There are 4! 
different permutations of ranks of Y: 


Ranks of X values: 1, 2, 3, 4 
Ranks of Y values: aj, a2, 43, a4 


where (a1, a2, 43, a4) is one of the 24 permutations of 1, 2, 3, 4. Since the distribu- 
tion is symmetric about 0, we need only compute one-half of the distribution. 


P T Number of Permutations Py {T =t} 

1 
0 —1.00 1 =~ 
24 

3 
1 —0.67 > 
? 24 

5 
2 —0.33 5 _ 
24 

6 
3 0.00 6 Fi 
24 


Similarly, for 1 = 3, the distribution of JT under Hp is as follows: 
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P T Number of Permutations Py {T = t} 
0 —1.00 1: 3,2, 1 1 

6 
1 —0.33 2: (2,3, 1), (3, 1, 2) - 


Example 3. Two judges rank four essays as follows: 


Essay 
Judge 1 2 3 4 
1,X 3 4 2 1 
2,Y 3 1 4 2 


To test Hy: rankings of the two judges are independent, let us arrange the rankings 
of the first judge from 1 to 4. Then we have: 


Judge 1,X: 1, 2, 3, 4 
Judge2,Y: 2, 4, 3, 1 


P = number of pairs of rankings for judge 2 such that for j > i, Yj; -Y¥; > 0 =2 
[the pairs (2, 4) and (2, 3)], and 


Since 
18 
Py{IT| = 0.33} = vias 0.75, 
we cannot reject Ho. 


For large n we can use an extension of Theorem 13.3.3 to bivariate case to con- 
clude that /”(U — t-) “> N(0, 4¢1), where 
ty = cov {y ((X1, 1), (X2, Yo), w ((X1, Yi), (Xs, ¥3))}- 


Under Ho it can be shown that 


3./n(n — 1) 
/2(2n + 5) 


See, for example, Kendall [49], Randles and Wolfe [83] or Gibbons [32]. Approxi- 
mation is good for n > 8. 


T  N(O, 1). 
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13.5.3. Spearman’s Rank Correlation Coefficient 


Let (X1, Yi), (X2, ¥Y2),.-.. , (Xn, Yn) be a sample from a bivariate population. In 
Section 7.3 we defined the sample correlation coefficient by 


(16) Ra i Ki OY) 
[yet (Xi See xy? ra (Fi _ yd 


where 
As n i) n 
X =n"! > Xi and Y an > ¥:. 
i=| i=l 


If the sample values X1, X2,... , X, and Y,, Y2,..., Y, are each ranked from 1 
to n in increasing order of magnitude separately, and if the X’s and Y’s have contin- 
uous DFs, we get a unique set of rankings. The data will then reduce to n pairs of 
rankings. Let us write 


R; = rank(X;) and S$; = rank(¥;); 


then Rj; and S; € {1,2,... , a}. Also, 


a7) R= He, 
1 1 
pes 4d fo ged Soe av nti 
(18) R=n ph yo «San 2G aL 
and 
(19) RB = is - 5)? = sees 


Substituting in (16), we obtain 


(20) Rw teeta (Ri — RVG - 8) _ ATT RS, _ n+) 
nn n(n? — 1) n~1 


Writing D; = R; — S; = (Ri — R) — (S; — 5S), we have 


3 D? = YR —R)* + YS; - Sy - 2 Ri — R\(S; — S) 
i=] i=] i=l i=l 


1 
= gn’ _ D- 20k —R)(S; — 3), 
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and it follows that 


6 yy D? 


The statistic R defined in (20) and (21) is called Spearman’s rank correlation coeffi- 
cient (see also Example 4.5.2). 
From (20) we see that 


3(n + 1) 
(22) ER= “ary e(Sas) - —y 


12 3an+1 
ce 


Under Ho, the RVs X and Y are independent, so that the ranks R; and S; are also 
independent. It follows that 


n+1 7 
En(RiSi) = ER:ES; = ( 5) ) 


and 


12 (n+1\? 341) 
23 Fifa es 2 ae 
ad HR a3 ( 2 ) ia ee 


Thus we should reject Ho if the absolute value of R is large, that is, reject Ho if 
(24) |R| > Ra, 


where Py){|R| > Ra} < a. To compute Ry we need the null distribution of R. 

For this purpose it is convenient to assume, without loss of generality, that Rj = i, 
= 1,2,...,n. Then D; = i — S;,i = 1,2,...,n. Under Mo, X and Y being 

independent, the n! pairs (i, S;) of ranks are equally likely. It follows that 


(25) Py{R =r} = (n!)~! x (number of pairs for which R = r) 


Note that —1 < R < 1, and the extreme values can occur only when either the 
rankings match, that is, Rj = S;, in which case R = 1, or Rj =n +1— Sj, in which 
case R = —1. Moreover, one need compute only one-half of the distribution, since 
it is symmetric about 0 (Problem 7). 

In the following example we compute the distribution of R for n = 3 and 4. The 
exact complete distribution of yy D?, and hence R, form < 10 has been tabulated 
by Kendall [49]. Table ST13 gives values of Ra for selected values of n and a. 
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Example 4. Let us first enumerate the null distribution of R for n = 3. This is 


done in the following table: 


3(n +1) 
n—l 


n, 


Be. _ 12D iis: 
(51, 82, 53) dois OGD 
, 3,2) 13 0.5 
, 1, 3) 13 0.5 
Thus 
gz r=1.0, 
ae 2, r=0.5 
Hi =ryt= 
2 2, or =-055, 
t, r=—10. 
Similarly, for 2 = 4 we have the following: 
(51, 52, 53, 54) isi r 
1 
(1, 2, 3, 4) 30 1 
(1, 3, 2, 4), (2, 1, 3, 4), C1, 2, 4, 3) 29 0.8 
(2, 1, 4, 3) 28 0.6 
(1, 3, 4, 2), (1, 4, 2, 3), (2, 3, 1, 4), G, 1, 2, 4) 27 0.4 
(1, 4, 3, 2), (3, 2, 1, 4) 26 0.2 
25 0.0 
The last value is obtained from symmetry. 
Example 5. In Example 3 we see that 
12x23 3x5 
= A? = 0.4. 


’=4xis 3 


Since Py, {|R| > 0.4} = 18/24 = 0.75, we cannot reject Ho at a = 0.05 or 


a = 0.10. 


For large samples it is possible to use a normal approximation. It can be shown 
(see, for example, Fraser [29, pp. 247—248]) that under Hp the RV 
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n 
7 = (nyras - an) n 9/2 


i=1 


or, equivalently, 
Z=RVn-I1 
has approximately a standard normal distribution. The approximation is good for 
n> 10. 
PROBLEMS 13.5 


1. A sample of 240 men was classified according to characteristics A and B. Char- 
acteristic A was subdivided into four classes, A;, Az, A3, and Ag, while B was 
subdivided into three classes, B,, Bz, and B3, with the following result: 


Is there evidence to support the theory that A and B are independent? 


2. The following data represent the blood types and ethnic groups of a sample of 
Iraqi citizens: 


Blood Type 
Ethnic Group O A B AB 
Kurd 531 450 293 226 
Arab 174 150 133 36 
Jew 42 26 26 8 
Turkoman 47 49 22 10 
Ossetian 50 59 26 15 


Is there evidence to conclude that blood type is independent of ethnic group? 


3. In a public opinion poll, a random sample of 500 American adults across the 
country was asked the following question: “Do you believe that there was a con- 
certed effort to cover up the Watergate scandal? Answer yes, no, or no opinion.” 
The responses according to political beliefs were as follows: 
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Political Response 

Affiliation Yes No No Opinion Total 

Republican 45 75 30 150 

Independent 85 45 20 150 

Democrat 140 30 30 200 
Total 270 150 80 500 


Test the hypothesis that attitude toward the Watergate cover-up is independent of 
political party affiliation. 


. Arandom sample of 100 families in Bowling Green, Ohio, showed the following 


distribution of home ownership by family income: 


Annual Income (dollars) 


Residential Less Than 30,000- 50,000 
Status 30,000 50,000 or Above 
Homeowner 10 15 30 
Renter 8 17 20 


Is home ownership in Bowling Green independent of family income? 


. Ina flower show the judges agreed that five exhibits were outstanding, and these 


were numbered arbitrarily from 1 to 5. Three judges each arranged these five 
exhibits in order of merit, giving the following rankings: 


Judge A: 5, 3, 1, 2, 4 
Judge B: 3, 1, 5, 4, 2 
JudgeC: 5, 2, 3, 1, 4 


Compute the average values of Spearman’s rank correlation coefficient R and 
Kendall’s sample tau coefficient T from the three possible pairs of rankings. 


For the bivariate normally distributed RV (X, Y), show that tr = 0 if and only if 
X and ¥ are independent. [Hint: Show that t = (2/7)sin~! p, where p is the 
correlation coefficient between X and Y.] 


. Show that the distribution of Spearman’s rank correlation coefficient R is sym- 


metric about 0 under Ap. 


. In Problem 5, test the null hypothesis that rankings of judge A and judge C are 


independent. Use both Kendall’s tau and Spearman’s rank correlation tests. 


. Arandom sample of 12 couples showed the following distribution of heights: 
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Height (in.) Height (in.) 
Couple Husband Wife | Couple Husband Wife 
1 80 72 7 74 68 
2 70 60 8 71 71 
3 73 76 9 63 61 
4 72 62 10 64 65 
5 62 63 11 68 66 
6 65 46 12 67 67 


(a) Compute T. 
(b) Compute R. 


(c) Test the hypothesis that the heights of husband and wife are independent, 
using T as well as R. In each case use the normal approximation. 


13.6 SOME APPLICATIONS OF ORDER STATISTICS 


In this section we consider some applications of order statistics. We are mainly in- 
terested in three applications: tolerance intervals for distributions, coverages, and 
confidence interval estimates for quantiles and location parameters. 


Definition 1. Let F be a continuous DF. A tolerance interval for F with tolerance 
coefficient » is a random interval such that the probability is y that this random 
interval covers at least a specific percentage (100p) of the distribution. 


Let X1, X2,..., Xn be a sample of size n from F, and let Xq), X(2y,... , X(n) 
be the corresponding set of order statistics. If the endpoints of the tolerance interval 
are two order statistics X(-), X(s), r < 5, we have 


(1) P{P{X(p) < X < Xs} = ph =y. 


Since F is continuous, F(X) is U(0, 1), and we have 


(2) P{X) < X < X(s)} = P{X < X(sy} — P{X < Xp} 
= F(X (s)) -* F(X) 
= Ug) — Ug), 


where U(,), Us) are the order statistics from U (0, 1). Thus (1) reduces to 
(3) P{Us) — Ug) 2 P}=y¥- 


The statistic V = Us) — Ug), 1 < r < s < n, is called the coverage of the 
interval (Xr), X(s)). More precisely, the differences Vi = F(X (ky) — F(X-1)) = 
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Ue — Vat, fork = 1,2,...,2 +1, where Ug) = —oo and Uq@4+1) = 1, are 
called elementary coverages. 
Since the joint PDF of U1), Uz), ... , Uqy is given by 


n!}, O< uy <u. <-:+ <u, 


0, otherwise, 


Fontan) = | 


the joint PDF of Vi, V2, ... , Vn is easily seen to be 


n}, yj > 0, i=1,2,....2, Dpu <1 


0, otherwise. 


(4) h(vt, 02,..., Un) = | 


Note that h is symmetric in its arguments. Consequently, V;’s are exchangeable RVs 
and the distribution of every sum of r, r < n, of these coverages is the same, and in 
particular, it is the distribution of Ui) = a V;, namely, 


—1 
n(" Jug wr O<u<1 
(5) ar(u) = r-1 
0, otherwise. 
The common distribution of elementary coverages is 
gi(u)=n(—u)"!, O<u<i1, =0, otherwise. 
Thus EV; = 1/( + 1) and Viet EV; = r/(n + 1). This may be interpreted as 
follows: The order statistics X(1), X(2), ... , Xm) partition the area under the PDF in 
n + 1 parts such that each part has the same average (expected) area. 


The sum of any r successive elementary coverages V;+1, Vi+1,.-- . Vir is called 
an r-coverage. Clearly, 


r 
(6) > Vi4z = UGiar — Ue, i+r<n, 
j=l 


and, in particular, Us) - Ug) = > aie 41 Vj. Since V’s are exchangeable, it follows 
that 


d 
(7) Us) — Ur) = Us-1) 
with PDF 


—1 
8s—r(u) = n( ° ae are, O<u<1. 
s—r-1 


From (3), therefore, 
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1 s—r—l n : a 
(8) y =/ 8s—r(u) du = bs (ie (1 — p) 


i=0 


where the last equality follows from (5.3.48). Given, p, y it may not always be 
possible to find s — r to satisfy (8). 


Example 1. Lets =n andr = 1. Then 


n-2 
v=)0 (F)ora — py"! =1—- p"—np""'(1— p). 


i=0 
If p = 0.8,n =5,r = 1, then 
y = 1 —(0.8)° — 5(0.8)4(0.2) = 0.263. 


Thus the interval (X(1), X(5)) in this case defines a 26 percent tolerance interval for 
0.80 probability under the distribution (of X). 


Example 2. Let X, X2, X3, X4, X5 be a sample from a continuous DF F. Let us 
find r and s,r <_s, such that (X(-), X(s)) is a 90 percent tolerance interval for 0.50 
probability under F. We have 


0.90 = P {u > 5| po (;) CG). 


It follows that if we choose s — r = 4, then y = 0.81; and if we choose s —r = 5, 
then y = 0.969. In this case we must settle for an interval with tolerance coefficient 
0.969, exceeding the desired value 0.90. 


In general, given p,0 < p < 1, itis possible to choose a sufficiently large sample 
size n and a corresponding value of s — r such that with probability > y an interval 
of the form (X(r), X(s)) covers at least 100p percent of the distribution. If s — r is 
specified as a function of n, one chooses the smallest sample size n. 

Example 3. Let p = 3 and y = 0.75. Suppose that we want to choose the 
smallest sample size required such that (X (2), Xin) covers at least 75 percent of the 
distribution. Thus we want the smallest n to satisfy 


= BONQ) 


From Table ST1 of binomial distributions we see that n = 14.” 
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We next consider the use of order statistics in constructing confidence intervals 
for population quantiles. Let X be an RV with a continuous DF F,0 < p < 1. Then 
the quantile of order p satisfies 


(9) Fp) = P 


Let X1, X2,..., Xp be n independent observations on X. Then the number of 
X;’8 < 3p is an RV that has a binomial distribution with parameters n and p. 
Similarly, the number of X;’s that are at least 3, has a binomial distribution with 
parameters n and | — p 

Let X(1), X(2), ... , X(n) be the set of order statistics for the sample. Then 


(10) P{X(r) < 3p} = P {at least r of the X;’s < 3p} 


“E()vo-0n 
Similarly, 


(11) P{X(s) = 3p} = P{at least n — s + 1 of the X;’s > 3p} 
= P{at most s — 1 of the X;’s < 3p} 


s—l 
n : oa 
= 5 (ofa - wy is 
i=o \? 
It follows from (10) and (11) that 


(12) P{X(r) < 3p < Xs} = P{Xs) = ap} — P{X@ > 3p} 
= P{X) < 3p} — 1+ P{X) > 3p} 


=> (io'a- py" ASC oia - piri 


= oid — py : 


It is easy to determine a confidence interval for 3, from (12) once the confidence 
level is given. In practice, one determines r and s such that s — r is as small as 
possible, subject to the condition that the level is 1 — a. 


Example 4. Suppose that we want a confidence interval for the median ( 4), 
based on a sample of size 7 with confidence level 0.90. It suffices to find r and S, 


r < s, such that 
s—l 7 
7 1 
—} > 0.90. 
) (;) Se 
i=r 
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By trial and error, using the probability distribution b(7, 4) we see that we can choose 
s=7,r =2orr = 1,s = 6; in either case s — r is minimum (= 5), and the 
confidence level is at least 0.92. 


Example 5. Let us compute the number of observations required for (X(1), Xi) 
to be a 0.95 level confidence interval for the median, that is, we want to find n such 
that 


P{X (1) < 31/2 < X~} = 0.95. 


n—1 n 
(") (5) > 0.95. 
a Mi 2 


t 


It suffices to find n such that 


It follows from Table ST1 that n = 6. 


Finally, we consider applications of order statistics to constructing confidence in- 
tervals for a location parameter. For this purpose we use the method of test inversion 
discussed in Chapter 11. We first consider confidence estimation based on the sign 
test of location. 

Let X1, Xo,... , X, be arandom sample from a symmetric, continuous DF F(x — 
6) and suppose that we wish to find a confidence interval for 6. Let Rt (X — 0) = 
number of X;’s > 09 be the sign-test statistic for testing Hp: 6 = 69 against H,: 6 ~ 
09. Clearly, R* (X — 99) ~ b(n, 4) under Ho. The sign-test rejects Ho if 


(13) min{R*(X — 69), R*(@—X)} <c 

for some integer c to be determined from the level of the test. Letr = c-+1. Then any 
value of 6 is acceptable provided that it is greater than the rth smallest observation 
and smaller than the rth largest observation, giving as the confidence interval 


(14) Xr) <O0< X(n4+1-r): 


If we want level 1 — a to be associated with (14), we choose c so that the level of test 
(13) is @. 


Example 6. The following 12 observations come from a symmetric, continuous 
DF F(« — 8): 


—223, —380, —94, —179, 194, 25, —177, —274, —496, —507, —20, 122. 


We wish to obtain a 95 percent confidence interval for 6. Sign test rejects Ho if 
Rt (X) > 9 or < 2 at level 0.05. Thus 


P{3 < Rt(X — 6) < 10} = 1 — 2(0.0193) = 0.9614 > 0.95. 
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It follows that a 95 percent confidence interval for @ is given by (X(3), Xqio) or 
(—380, 25). 


We next consider the Wilcoxon signed-ranks test of Ho: 6 = 6 to construct a 
confidence interval for @. The test statistic in this case is T+ = sum of ranks of 
positive (X; — 09)’s in the ordered |X; — 6o|’s. From (13.3.4), 


TH= SO ttxitxj>29) 


Isi<j<n 


(Xj + Xj) 
——— > 


= number of 60. 


1 
Let Tj; = (X; + Xj)/2, 1 < i < j <n and order the N = C ) mys in 


increasing order of magnitude 
Tay < Ta) <--: < Tw). 


Then using the argument that converts (13) to (14), we see that a confidence interval 
for 6 is given by 


(15) Tir) < 8 < Tww4i-r)- 
Critical values c are taken from Table ST10. 


Example 7. For the data in Example 6, the Wilcoxon signed-rank test rejects 
Ho: 9 = 6 at level 0.05 if T* > 640r T+ < 14. Thus 


P{14 < T*(X — 0) < 64} > 0.95. 


It follows that a 95% confidence interval for 6 is given by [T(14), T64)) = [—336.5, —20]. 


PROBLEMS 13.6 
1, Find the smallest values of n such that the intervals (a) (X(1), X (ny), and (b) (X (2) 
X(n—1)) contain the median with probability > 0.90. 


2. Find the smallest sample size required such that (X(1), X(n)) covers at least 90 
percent of the distribution with probability > 0.98. 


3. Find the relation between n and p such that (X(1), X(n)) covers at least 100p 
percent of the distribution with probability > 1 — p. 


4.. Given y, 5, po, pi with p > po, find the smallest n such that 


P(F(X(s)) — F(X) = po} > v 
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and 
P{F(Xs)) — F(X) = pi} < 4. 


Find also s — r. [Hint: Use normal approximation to the binomial distribution.] 


5. In Problem 4, find the smallest n and the associated value of s — r if y = 0.95, 
6 = 0.10, pi = 0.75, po = 0.50. 


6. Let X;, X2,... , X7 be arandom sample from a continuous DF F’. Compute: 
(a) P(Xq) < 305 < X(7). 
(b) P(X < 303 < X(5)). 
(c) P(X@) < 308 < X@))- 
7. Let X;, X2,... , Xn be iid with common continuous DF F. 
(a) What is the distribution of 


F(X@—1) — F(X) + F(Xq@) — F(X) 


for2<i<j<n-—1? 
(b) What is the distribution of [F (X(n)) — F(X) IF (Xa) — F(X)1? 


13.7 ROBUSTNESS 


Most of the statistical inference problems treated in this book are parametric in na- 
ture. We have assumed that the functional form of the distribution being sampled is 
known except for a finite number of parameters. It is to be expected that any estimator 
or test of hypothesis concerning the unknown parameter constructed on this assump- 
tion will perform better than the corresponding nonparametric procedure, provided 
that the underlying assumptions are satisfied. It is therefore of interest to know how 
well the parametric optimal tests or estimators constructed for one population per- 
form when the basic assumptions are modified. If we can construct tests or estima- 
tors that perform well for a variety of distributions, for example, there would be little 
point in using the corresponding nonparametric method unless the assumptions are 
seriously violated. 

In practice, one makes many assumptions in parametric inference, and any one or 
all of these may be violated. Thus one seldom has accurate knowledge about the true 
underlying distribution. Similarly, the assumption of mutual independence or even 
identical distribution may not hold. Any test or estimator that performs well under 
modifications of underlying assumptions is usually referred to as robust. 

In this section we first consider the effect that slight variation in model assump- 
tions have on some common parametric estimators and tests of hypotheses. Next 
we consider some corresponding nonparametric competitors and show that they are 
quite robust. 
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13.7.1 Effect of Deviations from Model] Assumptions on 
Some Parametric Procedures 


Let us first consider the effect of contamination on sample mean as an estimator of 
the population mean. The most commonly used estimator of the population mean yz is 
the sample mean X. It has the property of unbiasedness for all populations with finite 
mean. For many parent populations (normal, Poisson, Bernoulli, gamma, etc.) it is a 
complete sufficient statistic and hence a UMVUE. Moreover, it is consistent and has 
asymptotic normal distribution whenever the conditions of the central limit theorem 
are satisfied. Nevertheless, the sample mean is affected by extreme observations, 
and a single observation that is either too large or too small may make X worthless 
as an estimator of 2. Suppose, for example, that X;, X2,... , Xn is a sample from 
some normal population. Occasionally, something happens to the system, and a wild 
observation is obtained; that is, suppose that one is sampling from NV (yz, 0”), say, 
100@ percent of the time and from NV (yu, ko), where k > 1, (1 — a)100 percent 
of the time. Here both yz and o? are unknown, and one wishes to estimate 1. In this 
case one is really sampling from the density function 


(1) fx) = afo(x) + (1 — &) fix), 
where fo is the PDF of (uz, 07) and fj is the PDF of N(y, ko). Clearly, 


1 Xi 


n 


(2) X= 


is still unbiased for ws. If a is nearly 1, there is no problem since the underlying 
distribution is nearly (1, 07), and X is nearly the UMVUE of yz with variance 
o*/n. If 1 — @ is large (i.e., not nearly 0), then, since one is sampling from /, the 
variance of X; is o* with probability « and is ko? with probability 1 — a, and we 
have 


— 1 o 
(3) Vag (X) = 7 var(X}) = —le + (1 —a)k]. 


If k(1 — q) is large, var, (X) is large and we see that even an occasional wild obser- 
vation makes X subject to a sizable error. The presence of an occasional observation 
from N (1, ka”) is frequently referred to as contamination. The problem is that we 
do not know, in practice, the distribution of the wild observations, and hence we do 
not know the PDF /f. It is known that the sample median is a much better estimator 
than the mean in the presence of extreme values. In the contamination model dis- 
cussed above, if we use Z1/2, the sample median of the X;’s, as an estimator of u 
(which is the population median), then for large 


] 1 
(4) E(Z1/2 — w)” = var(Z1)2) © miwr 
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(see Theorem 7.5.2 and Remark 7.5.7). Since 


f() = afo(u) + I — a) fit) 


a 1 l-a 1 
= +1 - a = («4 3) z 
ovV2n oVJV2nk Vk ovVv2n 
we have 


no2 1 
5 Z12) & —-————.. 
(5) var(Z}/2) an (a +[d —w/Vey? 


As k — ov, var(Z1/2) © no*/2na?. If there is no contamination, a = 1 and 
var(Z1/2) © mo7/2n. Also, 

mo*/2na* 1 

mo2/2n a2’ 


which will be close to 1 if a is close to 1. Thus the estimator Z} /2 will not be greatly 
affected by how large k is, that is, how wild the observations are. We have 


var(X) 2 (1—a)\? 
aay = gle tak (a+ aE ) > 0O ask —> oo. 


Indeed, var(X) —> 00 as k — 00, whereas var(Z1/2) > mo7/2na? as k — oo. One 
can check that when k = 9 and a ~ 0.915, the two variances are (approximately) 
equal. As k becomes larger than 9 or a smaller than 0.915, Z1;2 becomes a better 
estimator of ps than X. 

There are other flaws as well. Suppose, for example, that X1, X2,..., Xn isa 
sample from U(0, 9), @ > 0. Then both X and T(X) = (Xqy + X(ny)/2, where 
Xqy = min(Xy,...,Xn), Xm) = max(X1,... , Xn), are unbiased for EX = 6/2. 
Also, varg(X) = var(X)/n = 62/[12n], and one can show that var(T) = 67/[2(n + 
1)(n + 2)]. It follows that the efficiency of X relative to that of T is 


vatg(T) = 6n 


—— = ——__—___— < ] if 2: 
wk) EDGE) 


effg(X | T) = 


In fact, eff (X | T) + 0asn —> 00, so that in sampling from a uniform parent X is 
much worse than T, even for moderately large values of n. 

Let us next turn our attention to the estimation of standard deviation. Let X1, X2, 
... , Xp, be asample from N(, 07). Then the MLE of o is 


n 2 71/2 1/2 
a (Xi -X)? | (n—-1 
(6) o= [pa] =f : ) S. 
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Note that the lower bound for the variance of any unbiased estimator for o is 07/2n. 
Although G is not unbiased, the estimator 


_ fati@—)/2. — fa=1la- 0/2) 
ee s= 3 Ta °-V¥ 2. Fay > 


is unbiased for o. Also, 


n—-1 [Ae a pair = 
2 T(n/2) 


2 
oO 1 
=—+0(-—}. 
2n * (=) 


Thus the efficiency of S; (relative to the estimator with least variance = a2 /2n) is 


(8) wats) =| 


o7/2n 1 


= — ~——_ < 1 
var(S;) 1+020(2/n) 


and -> 1 asm —> oo. For small n, the efficiency of S; is considerably smaller than 
1. Thus, forn = 2, eff(X,) = 1/[2(@r — 2)] = 0.438, and for n = 3, eff(S1) = 
a /[6(4 — x)] = 0.61. 

Yet another estimator of o is the sample mean deviation 


it = 
(9) S2= 1X: — XI. 
i=] 


r1< 1 
E(/>-> 01x - = /-E|X; — pl = 
( ee a) 5) |X; —pl =o 
and 


r1< nxn-2 
10 /—-} Xi- 2. 
(10) ve Dn 2 i u) on o 


If n is large enough so that X ~ y, we see that $3 = (7/2) S2 is nearly unbiased 
for o with variance [(2 — 2) /2n)o?. The efficiency of $3 is 


Note that 


o7(2n) 1 


lao ese 


For large n, the efficiency of S; relative to 53 is 


var(S3) _ [(@—2)/2n]Jo®? ee 
var(S|) o2/2n+O(U/n2) OQ/n) 
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Now suppose that there is some contamination. As before, let us suppose that for 
a proportion a of the time we sample from NV (41, oa”), and for a proportion 1 — @ of 
the time we get a wild observation from N(u, ko), k > 1. Assuming that both 2 
and o? are unknown, suppose that we wish to estimate o. In the notation used above, 
let 


f(x) = afo(x) + (1 — a) fix), 


where fo is the PDF of N(x, o”) and f; is the PDF of N (1, ka). Let us see how 
even small contamination can make the maximum likelihood estimator 6 of o quite 
useless. . 

If 6 is the MLE of @, and g is a function of 6, then g(@) is the MLE of g(6). In 
view of (7.5.7) we get 


, 1 * 
(11) EG-—oc) qe —o’*)*. 
Using Theorem 7.3.5, we see that 


ba — BS 
n 


(12) E(6? —0*)* = 
(dropping the other two terms with n? and n> in the denominator), so that 


: 1 
(13) E(6 — 0)? © = (ma — 119). 


For the density f, we see that 


(14) pg = 304[a +k — @)] 
and 
(15) p2 = ofa +k(1 —@)]. 
It follows that 
a 2 2 
(16) E(6 ~ 0) — [3a + C1 — a) ~ fa +k ~ a] i; 


If we are interested in the effect of very small contamination, a ~ 1 and 1—a ~ 0. 
Assuming that k(1 — a) © 0, we see that 


2: 
7) E{é —o}* © =~ (31 +220 —a)]—1) 


e 3,2 
= wl! + 3k*(1 — a) ]. 
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In the normal case, 44 = 304% and ws = 0%, so that from (11) 


2 
A 2. 9% 
E{é —o} & On 
Thus we see that the mean square error due to a small contamination is now mul- 
tiplied by a factor [1 + 3K (1 — a)]. If, for example, k = 10,a@ = 0.99, then 
1+ 3k?(1 — a) = 3. If k = 10, & = 0.98, then 1 + 3k7(1 — @) = 4, and so on. 

A quick comparison with $3 shows that although S; (or even G) is a better esti- 
mator of o than $3 if there is no contamination, 53 becomes a much better estimator 
in the presence of contamination as k becomes large. 

Next we consider the effect of deviation from model assumptions on tests of hy- 
potheses. One of the most commonly used tests in statistics is Student’s t-test for 
testing the mean of a normal population when the variance is unknown. Let X1, X2, 
..., X», be a sample from some population with mean jz and finite variance 0“. As 
usual, let X denote the sample mean, and S?, the sample variance. If the population 
being sampled is normal, the t-test rejects Hp: 4 = wo against H,: uw # po at level 
o if |X — wol > tr-1,0/2(s//n). If n is large, we replace t,—1,0/2 by the correspond- 
ing critical value, za/2, under the standard normal law. If the sample does not come 
from a normal population, the statistic T = [(X — wo) /S1./n is no longer distributed 
as a t(n — 1) statistic. If, however, n is sufficiently large, we know that T has an 
asymptotic normal distribution irrespective of the population being sampled, as long 
as it has a finite variance. Thus, for large n, the distribution of T is independent of 
the form of the population, and the t-test is stable. The same considerations apply 
to testing the difference between two means when the two variances are equal. Al- 
though we assumed that n is sufficiently large for Slutsky’s result (Theorem 6.2.15) 
to hold, empirical investigations have shown that the test based on Student’s statistic 
is robust. Thus a significant value of t may not be interpreted to mean a departure 
from normality of the observations. Let us next consider the effect of departure from 
independence on the t-distribution. Suppose that the observations X;, X2,...,Xn 
have a multivariate normal distribution with EX; = y, var(X;) = 0%, and p as the 
common correlation coefficient between any X; and X;,i 4 j. Then 


2 
(18) EX =p, and var(X) = <1 4+(n—VDpl, 


and since X;’s are exchangeable it follows from Remark 7.3.1 that 
(19) ES? =o07(1—p). 


For large n, the statistic /n(X — Ho)/S will be asymptotically distributed as 
N(O, 1+np/(1—p)), instead of N(0, 1). Under Ho, o = Oand T? = n(X —p9)?/S2 
is distributed as F(1, n — 1). Consider the ratio 


nE(X — yo)? _ o7[1+(— 1p] i np 
ES? ~ 624 —p) 1-p 


(20) 
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The ratio equals 1 if p = 0 but is > 0 for p > O and > oo as p — 1. It follows that 
a large value of T is likely to occur when p > 0 and is large, even though jg is the 
true value of the mean. Thus a significant value of t may be due to departure from 
independence, and the effect can be serious. 

Next, consider a test of the null hypothesis Ho: 0 = oo against Hj: 0 # op. 
Under the usual normality assumptions on the observations X;, X2,... , Xn, the test 
statistic used is 


2 @=s? _ DiGi =x? 


(21) V=—5 = 


> 


which has a x2(n — 1) distribution under Ho. The usual test is to reject Ho if 


(n — 1)S? 
(22) Vo = eee ies > 5 eT or Vo < aa 
0 
Let us suppose that X;, X2,... , Xn are not normal. It follows from Corollary 2 of 
Theorem 7.3.5 that 
wa, 3-7" 4 

S$?) = = + ——— p, 
(23) var(S") = ~~ + ja= 
so that 

o27}) not n(n—1) 
Writing 2 = (144/04) — 3, we have 

s? y2 2 

(25) we (5) = 


when the X;’s are not normal, and 


Ss? 2 


when the X;’s are normal (72 = 0). Now (n — 1) S2 = ya (Xi _ x) is the sum 
of n identically distributed but dependent RVs (Xj; — X)’, j = 1,2,...,n. Using 
a version of the central limit theorem for dependent RVs (see, e.g., Cramér [16, p. 
365)), it follows that 
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under Ho, is asymptotically (0, 1 + (72/2)), and not N’(0, 1) as under the normal 
theory. As a result, the size of the test based on the statistic Vo will be different from 
the stated level of significance if y2 differs greatly from 0. It is clear that the effect 
of violation of the normality assumption can be quite serious on inferences about 
variances, and the chi-square test is not robust. 

In the discussion above we have used somewhat crude calculations to investigate 
the behavior of the most commonly used estimators and test statistics when one or 
more of the underlying assumptions are violated. Our purpose here was to indicate 
that some tests or estimators are robust, whereas others are not. The moral is clear: 
One should check carefully to see that the underlying assumptions are satisfied be- 
fore using parametric procedures. 


13.7.2 Some Robust Procedures 


Let X1, X2,... , X, be a random sample from a continuous PDF f(x — @),0 € R, 

and assume that f is symmetric about 0. We shail be interested in estimation or tests 

of hypotheses concerning 6. Our objective is to find procedures that perform well for 

several different types of distributions but do not have to be optimal for any particular 

distribution. We will call such procedures robust. We first consider estimation of 0. 
The estimators fall under one of the following three types: 


1. Estimators that are functions of R = (Rj, R2,..., Rn), where R; is the 
rank of X;, are known as R-estimators. Hodges and Lehmann [41] devised 
a method of deriving such estimators from rank tests. These include the sam- 
ple median X (based on the sign test), and W = {med{(X; + X;)/2, 1<i< 
j <n} based on the Wilcoxon signed-rank test. 

2. Estimators of the form )~j_, a; Xj are called L-estimators, being linear com- 
binations of order statistics. This class includes the median, the mean, and the 
trimmed mean obtained by dropping a prespecified proportion of extreme ob- 
servations. 

3. Maximum likelihood type estimators obtained as solutions to certain equa- 
tions ee W(X; — 0) = 0 are called M-estimators. The function y(t) = 
—f'(t)/f(@) gives MLEs. 


Definition 1. Let k = [na] be the largest integer < na, where 0 < a < }. Then 
the estimator 


n—k 
oa Xj 
(27) Xo = aw 
jaeei 2 — 2k 


is called a trimmed mean. 
Two extreme examples of trimmed means are the sample mean X(a = 0) and the 


median X when all except the central (n odd) or the two central (n even) observations 
are excluded, 
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Example 1. Consider the following sample of size 15 taken from a symmetric 
distribution. 


0.97 0.66 0.73 0.78 1.30 0.58 0.79 0.94 
0.52 0.52 0.83 1.25 1.47 0.96 0.71 


Suppose that a = 0.10. Then k = [na] = 1 and 


Here x = 0.867, med) <j<15 xj = xg) = 0.79. 


We limit this discussion to four estimators of location: the sample median, 
trimmed mean, sample mean, and Hodges—Lehmann estimator based on Wilcoxon 
signed-rank test. To compare the performance of two procedures A and B, we use a 
(large-sample) measure of relative efficiency due to Pitman. Pitman’s asymptotic rel- 
ative efficiency (ARE) of procedure B relative to procedure A is the limit of the ratio 
of sample sizes n4/ng, where na and ng are sample sizes needed for procedures 
A and B to perform equivalently with respect to a specified criterion. For example, 
suppose that {7,(4)} and {T7,,(g)} are two sequences of estimators for (6) such that 


26 
Tra) ~ AN (vo iO) 


and 


219 
Tn(B) ~ AN (vo ie 


Suppose further that A and B perform equivalently if their asymptotic variances are 
the same, that is, 


o3(0) 05) 
n(A)  n(B)" 


Then 


n(A) 03 (8) 
_—_ . 
n(B) (6) 


Clearly, different performance measures may lead to different measures of ARE. 

Similarly, if procedures A and B lead to two sequences of tests, then ARE is the 
limiting ratio of the sample sizes needed by the tests to reach a certain power fo 
against the same alternative and at the same limiting level a. 
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Accordingly, let e(B, A) denote the ARE of B relative to A. If e(B, A) = 7 say, 
then procedure A requires (approximately) half as many observations as procedure 
B. We will write e-(B, A) whenever necessary to indicate the dependence of ARE 
on the underlying DF F. 

For detailed discussion of Pitman efficiency we refer to Lehmann [59, pp. 371— 
380], Lehmann [61, Sec. 5.2], Randles and Wolfe [83, Chap. 5], Serfling [100, 
Chap. 10], and Zacks [120]. The expressions for AREs of median and the Hodges— 
Lehmann estimators of location parameter 6 with respect to the sample mean X are 


(28) er(X, X) = 407 f (0) 
and 

a oo 2 
(29) er(W, X) = 1202 [ fl fdr] ' 


where f is the PDF corresponding to F. To get e F(X , W) we use the fact that 


2 er(X, X) 

30 X, Ww — 
(30) er(X, W) eS a 

= f(0) 

3[f%, £20) dx]? 
Bickel [4] showed that 

— “ait o2 

(31) er(Xa,X) = —&, 
Oy 
where 
2 2 3)-a@ 2 

(32) bar Gacoayt [ [ t? f(t) dt +e41-a| 


and 34 is the unique ath percentile of F. It is clear from (32) that no closed-form 
expression for er (Xq, X) is possible for most DFs F. 
In the following table we give the AREs for some selected F. 
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ARE Computations for Selected F 


F e(X, X) e(W, X) e(X, W) 
11 1 1 
U(--, = oa 2 
( 2 > 3 ! 3 
N(O, 1) 2/m = 0.637 3/x = 0.955 ; 
Logistic, f(x) =e (1+ e7*)"| 2/12 = 0.822 1.10 0.748 
Doubie exponential, 
1 
F(x) = 5 exp(—ix)) 2 15 ; 
C(O, 1) [oe) foe) = 
3 
It can be shown that Cr (X, X) > 4 for all symmetric F, so X is quite inefficient 


compared to X for U(— 5 ; 3). Even for normal f, X would require 157 observations 
to achieve the same accuracy that X achieves with 100 observations. For heavier- 
tailed distributions, however, X provides more protection that X. 

The values of e(W, X), on the other hand, are quite high for most F and, in fact, 
er(W, X) > 0.864 for all symmetric F. Even for normal F one loses little (4.5%) 
in using W instead of X. Thus W is more robust as an estimator of @. 

A look at the values of e(X, W) shows that X is worse than W for distributions 
with light-tails but does slightly better than W for heavier-tailed F. 

Let us now compare the AREs of X,, X, and W. The following AREs for selected 
a are due to Bickel [4]. 


ARE Comparisons 
a=0.01 a = 0.05 
F e(Xa, X) ew, Xa) e(Xe, X) e(W, Xa) 
Uniform 0.96 1.04 0.83 1.20 
Normal 0.995 0.96 0.97 0.985 
Double exponential 1.06 1.41 1.21 1.24 
Cauchy foe) 6.72 fore) 2.67 


We note that X_ performs quite well compared to X. In fact, for normal distribu- 
tion the efficiency is quite close to 1, so there is little loss in using X,. For heavier- 
tailed distributions, Xq is preferable. For small values of a, it should be noted that 
X,q does not differ much from X. Nevertheless, Xq, is more robust; it cannot do much 
worse than X but can do much better. Compared to the Hodges—Lehmann estimator, 
X, does not perform as well. It (W) provides better protection against outliers (heavy 
tails) and gives up little in the normal case. 

Finally, we consider testing Hp: 6 = 69 against H): 6 > @. Recall that X1, X2, 

, Xp are iid with common continuous symmetric DF F(x — @), 8 € R and PDF 
f(x — 6). Suppose that of = var(X;) < oo. Let S denote the sign test based on 


ROBUSTNESS 661 


the statistic R* (X) = 77.) Itx;>0], W denote the Wilcoxon signed-rank test based 
on the statistic T+(X) = D7) <j<j<n lx;+x;>2%], M denote the test based on the 
Z-statistic Z = Jn(X — 09)/ar, and t denote Student’s t-test based on the statistic 
/n(X — 69)/S, where S? is the sample variance. ‘ 

First note that e(7, M) = 1. Next we note that er(S, t) = er(X, X), er(W,t) = 
er(W, X), so that AREs are the same as given in (28), (29), and (30), and values of 
ARE given in the table for various F remain the same for corresponding tests. 

Similar remarks apply as for the case of estimation of @. The sign test is not as 
efficient as the Wilcoxon signed-rank test. But for heavier-tailed distributions such as 
Cauchy and double exponential, the sign test does better than the Wilcoxon signed- 
rank test. 


PROBLEMS 13.7 


1. Let (X1, X2,..., Xn) be jointly normal with EX; = wu, var(X;) = o7, and 
cov(X;, Xj) = po? if |i — j| = 1, i A j, and = 0 otherwise. 


(a) Show that 
2 
var(X) = 2 [ +20 (1 : -)] 
n n 


E(S*) = 0” (1 = *) ‘ 
n 


Show that the t-statistic ./n(X — 4)/S is asymptotically normally distributed 
with mean 0 and variance 1 + 2. Conclude that the significance of t is over- 
estimated for positive values of o and underestimated for p < 0 in large 
samples. 


and 


(b 


— 


(c) For finite n, consider the statistic 


2 n(X —p)? 


Compare the expected values of the numerator and the denominator of T? 
and study the effect of p # 0 to interpret significant t values. (Scheffé 
(99, p. 338]) 
2. Let X;, X2,..., Xn, be a random sample from G(q@, B), a > 0, B > 0. 
(a) Show that 


3a(a + 2) 
4 = ——,—. 


w2= ap”, and B 


(b) Show that 
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2 
wn n3| # (n — 1) (2+5). 
oO a 


(c) Show that the large sample distribution of (n — 1) S*/o? is normal. 

(d) Compare the large-sample test of Ho: o = oo based on the asymptotic nor- 
mality of (n — 1)S?/o? with the large-sample test based on the same statistic 
when the observations are taken from a normal population. In particular, take 
a = 2. 

Let X1, X2,..., Xm and Y}, Y2,..., ¥, be two independent random samples 

from populations with means jz; and 22, and variances oa? and a, respectively. 

Let X,Y be the two sample means, and SF S3 be the two sample variances. 

Write N = m+n, R = m/n, and @ = oa} /o3. The usual normal theory test 

of Ho: 41 — 2 = 4q is the t-test based on the statistic 


T= X-Y—58 
© Sp(1/m + 1/n)'/2? 


where 


gM 1)S? + (n — 1)S} 
si m+n—2 


Under Hp, the statistic T has a t-distribution with N — 2 d.f., provided that oa? a 
Os. Show that the asymptotic distribution of 7 in the nonnormal case is V'(0, (0+ 
R)(1 + R@)~!) for large m and n. Thus if R = 1, T is asymptotically (0, 1) as 
in the normal theory case assuming equal variances, even though the two samples 
come from nonnormal populations with unequal variances. Conclude that the test 
is robust in the case of large, equal sample sizes. (Scheffé [99, p. 339]) 


. Verify the ARE computations for F in the table above using the expressions of 


ARE in (28), (29), and (30). 


. Suppose that F is a G(a, 8) RV. Show that 


3aT2 (2a) 


e(W, X) = aaa = DAT @I* 


(Note that F is not symmetric.) 


. Suppose that F has PDF 


T'(m) 


=, —00 <x < 0, 
T(1/2)P (Gm — 1)/2)(1 + x7)" 


f@) 


for m > 1. Compute e(X, X), e(W, X), and e(X, W). (From Problem 3.2.3, 
E|X|k < oifk <m— 34.) 
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Frequently Used Symbols 
and Abbreviations 


=> implies 

° implies and is implied by 

> converges to 

te increasing, decreasing 

ry nonincreasing, nondecreasing 

M(x) gamma function 

lim, lim, lim limit superior, limit inferior, limit 

R, Rr real line, n-dimensional Euclidean space 
B, B, Borel o-field on R, Borel o-field on R,, 
Ta indicator function of set A 

&(x) = lifx > 0,and=Oifx <0 

u EX, expected value 

My EX", n > 0 integreal 

Bes E|X|",a >0 

[ey E(X — EX)*,k > 0 integral 

o = [Lp, variance 

fick sf? first, second, third derivatives of f 

ie distributed as 

7 asymptotically (or approximately) equal to 
5 convergence in law 

me convergence in probability 


|: 


convergence almost surely 


670 FREQUENTLY USED SYMBOLS AND ABBREVIATIONS 


> convergence in rth mean 

RV random variable 

DF distribution function 

PDF probability density function 

PMF probability mass function 

PGF probability generating function 
MGF moment generating function 

d.f. degrees of freedom 

BLUE best linear unbiased estimator 
MLE maximum likelihood estimator 
MVUE minimum variance unbiased estimator 
UMA uniformly most accurate 

UMVUE uniformly minimum variance unbiased estimator 
UMAU uniformly most accurate unbiased 
MP most powerful 

UMP uniformly most powerful 

io. infinitely often 

iid independent, identically distributed 
SD standard deviation 

MLR monotone likelihood ratio 

MSE mean square error 

WLLN weak law of large numbers 

SLLN strong law of large numbers 

CLT central limit theorem 

b(1, p) Bernoulli with parameter p 

b(n, p) binomial with parameters n, p 
NB(r; p) negative binomial with parameters r, p 
P(A) Poisson with parameter 2 

U[a, bj uniform on [a, 5] 

G(a, B) gamma with parameters a, 6 

Ba, B) beta with parameters a, 6 

x7(n) chi-square with d.f. n 

C(u, @) Cauchy with parameters p21, 0 


N (ut, 07) normal with mean jz, variance o” 


FREQUENTLY USED SYMBOLS AND ABBREVIATIONS 


t(n) 
F(m,n) 


Fano 
AN(Hn, 57) 
GLR 

MRE 

Inx 

exp(x) 
LMP 

L£(X) 


xty 


Student’s t with n d.f. 
F-distribution with (m, n) c.f. 
100(1 — a@)th percentile of N’(0, 1) 
100(1 — @)th percentile of x?(n) 
100(1 — @)th percentile of t(n) 
100(1 — a)th percentile of F(m, n) 
asymptotically normal 

generalized likelihood ratio 
minimum risk equivariant 
logarithm (to base e) of x 
exponential 

locally most powerful 

law or distribution of RV X 

X and Y identically distributed 
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Statistical Tables 


ST1 Cumulative Binomial Probabilities 

ST2 Tail Probability Under Standard Normal Distribution 

ST3 Critical Values Under Chi-Square Distribution 

ST4 Student’s t-Distribution 

ST5  F-Distribution: 5% and 1% Points for the Distribution of F 

ST6 Random Normal Numbers, u = 0 ando = 1 

ST7 Critical Values of the Kolmogorov-Smirnov One-Sample Test Statistic 


ST8 Critical Values of the Kolmogorov-Smimov Test Statistic for Two Samples 
of Equal Size 


ST9 Critical Values of the Kolmogorov-Smimov Test Statistic for Two Samples 
of Unequal Size 


ST10 Critical Values of the Wilcoxon Signed-Rank Test Statistic 
STi1 Critical Values of the Mann—Whitney—Wilcoxon Test Statistic 
ST12 Critical Points of Kendall’s Tau Test Statistic 

ST13 Critical Values of Spearman’s Rank Correlation Statistic 
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Table ST1. Cumulative Binomial Probabilities, )~;_, (*) pl — p)"*,r = 0,1, 2, 


. nl 


0.01 


0.9801 
0.9999 
0.9703 
0.9997 
1.0000 
0.9606 
0.9994 
1.0000 


= 
™ 


0.9510 
0.9990 
1.0000 


0.9415 
0.9986 
1.0000 


0.9321 
0.9980 
1.0000 


0.9227 
0.9973 
0.9999 
1.0000 


0.9135 
0.9965 
0.9999 
1.0000 


~ 
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0.05 


0.10 


0.20 


0.30 


0.333 


0.40 


0.50 


0.9025 
0.9975 
0.8574 
0.9928 
0.9999 
0.8145 
0.9860 
0.9995 
1.0000 
0.7738 
0.9774 
0.9988 
0.9999 
1.0000 
0.7351 
0.9672 
0.9977 
0.9998 
0.9999 
1.0000 
0.6983 
0.9556 
0.9962 
0.9998 
1.0000 


0.6634 
0.9427 
0.9942 
0.9996 
1.0000 


0.6302 
0.9287 
0.9916 
0.9993 
0.9999 


0.8100 
0.9900 
0.7290 
0.9720 
0.9990 
0.6561 
0.9477 
0.9963 
0.9999 
0.5905 
0.9185 
0.9914 
0.9995 
1.0000 
0.5314 
0.8857 
0.9841 
0.9987 
0.9999 
1.0000 
0.4783 
0.6554 
0.8503 
0.9743 
0.9973 
0.9998 
1.0000 
0.4305 
0.8131 
0.9619 
0.9950 
0.9996 
1.0000 


0.3874 
0.7748 
0.9470 
0.9916 
0.9990 


0.6400 
0.9600 
0.5120 
0.8960 
0.9920 
0.4096 
0.8192 
0.9728 
0.9984 
0.3277 
0.7373 
0.9421 
0.9933 
0.9997 
0.2621 
0.6553 
0.9011 
0.9830 
0.9984 
0.9999 
0.2097 
0.5767 
0.8520 
0.9667 
0.9953 
0.9996 
1.0000 
0.1678 
0.5033 
0.7969 
0.9437 
0.9896 
0.9988 
1.0000 


0.1342 
0.4362 
0.7382 
0.9144 
0.9805 


0.4900 
0.9100 
0.3430 
0.7840 
0.9730 
0.2401 
0.6517 
0.9163 
0.9919 
0.1681 
0.5283 
0.8370 
0.9693 
0.9977 
0.1176 


* 0.4201 


0.7442 
0.9294 
0.9889 
0.9991 
0.0824 
0.3294 
0.6471 
0.8740 
0.9712 
0.9962 
0.9998 
0.0576 
0.2553 
0.5518 
0.8059 
0.9420 
0.9887 
0.9987 
0.9999 
0.0404 
0.1960 
0.4628 
0.7296 
0.9011 


0.4444 
0.8888 
0.2963 
0.7407 
0.9629 
0.1975 
0.5926 
0.8889 
0.9877 
0.1317 
0.4609 
0.7901 
0.9547 
0.9959 
0.0878 
0.3512 
0.6804 
0.8999 
0.9822 
0.9987 
0.0585 
0.2633 
0.5706 
0.8267 
0.9547 
0.9931 
0.9995 
0.0390 
0.1951 
0.4682 
0.7413 
0.9120 
0.9803 
0.9974 
0.9998 
0.0260 
0.1431 
0.3772 
0.6503 
0.8551 


0.3600 
0.8400 
0.2160 
0.6480 
0.9360 
0.1296 
0.4742 
0.8198 
0.9734 
0.0778 
0.3370 
0.6826 
0.9130 
0.9898 
0.0467 
0.2333 
0.5443 
0.8208 
0.9590 
0.9959 
0.0280 
0.1586 
0.4199 
0.7102 
0.9037 
0.9812 
0.9984 
0.0168 
0.1064 
0.3154 
0.5941 
0.8263 
0.9502 
0.9915 
0.9993 
0.0101 
0.0706 
0.2318 
0.4826 
0.7334 


0.2500 
0.7500 
0.1250 
0.5000 
0.8750 
0.0625 
0.3125 
0.6875 
0.9375 
0.0312 
0.1874 
0.4999 
0.8124 
0.9686 
0.0156 
0.1094 
0.3438 
0.6563 
0.8907 
0.9845 
0.0078 
0.0625 
0.2266 
0.5000 
0.7734 
0.9375 
0.9922 
0.0039 
0.0352 
0.1445 
0.3633 
0.6367 
0.8555 
0.9648 
0.9961 
0.0020 
0.0196 
0.0899 
0.2540 
0.5001 
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0.9044 
0.9958 
1.0000 


0.8954 
0.9948 
0.9998 
1.0000 


0.8864 
0.9938 
0.9998 
1.0000 
1.0000 
1.0000 


0.8775 
0.9928 
0.9997 
1.0000 


1.0000 


0.5987 
0.9138 
0.9884 
0.9989 
0.9999 
1.0000 


0.5688 
0.8981 
0.9848 
0.9984 
0.9999 
1.0000 


0.5404 
0.8816 
0.9804 
0.9978 
0.9998 
1,0000 


0.5134 
0.8746 
0.9755 
0.9969 
0.9997 
1.0000 


0.9998 
0.9999 
1.0000 


0.3487 
0.7361 
0.9298 
0.9872 
0.9984 
0.9999 
1.0000 


0.3138 
0.6974 
0.9104 
0.9815 
0.9972 
0.9997 
1.0000 


0.2824 
0.6590 
0.8892 
0.9744 
0.9957 
0.9995 
1.0000 


0.2542 
0.6214 
0.8661 
0.9659 
0.9936 
0.9991 


0.9970 
0.9998 
1,0000 


0.1074 
0.3758 
0.6778 
0.8791 
0.9672 
0.9936 
0.9991 
0.9999 
1.0000 


0.0859 
0.3221 
0.6174 
0.8389 
0.9496 
0.9884 
0.9981 
0.9998 
1.0000 


0.0687 
0.2749 
0.5584 
0.7946 
0.9806 
0.9961 
0.9994 
0.9999 
1.0000 


0.0550 
0.2337 
0.5017 
0.7473 
0.9009 
0.9700 


0.30 


0.333 


0.40 


0.50 


0.9746 
0.9956 
0.9995 
0.9999 
0.0282 
0.1493 
0.3828 
0.6496 
0.8497 
0.9526 
0.9894 
0.9984 
0.9998 
1.0000 
0.0198 
0.1130 
0.3128 
0.5696 
0.7897 
0.9218 
0.9784 
0.9947 
0.9994 
0.9999 
1.0000 
0.0139 
0.0850 
0.2528 
0.4925 
0.7237 
0.8822 
0.9614 
0.9905 
0.9983 
0.9998 
1.0000 


0.0097 
0.0637 
0.2025 
0.4206 
0.6543 
0.8346 


0.9575 
0.9916 
0.9989 
0.9998 
0.0173 
0.1040 
0.2991 
0.5592 
0.7868 
0.9234 
0.9803 
0.9966 
0.9996 
0.9999 
0.0116 
0.0752 
0.2341 
0.4726 
0.7110 
0.8779 
0.9614 
0.9912 
0.9986 
0.9999 
1.0000 
0.0077 
0.0540 
0.1811 
0.3931 
0.6315 
0.8223 
0.9336 
0.9812 
0.9962 
0.9995 
0.9999 
1.0000 
0.0052 
0.0386 
0.1388 
0.3224 
0.5521 
0.7587 


0.9006 
0.9749 
0.9961 
0.9996 
0.0060 
0.0463 
0.1672 
0.3812 
0.6320 
0.8327 
0.9442 
0.9867 
0.9973 
0.9999 
0.0036 
0.0320 
0.1189 
0.2963 
0.5328 
0.7535 
0.9007 
0.9707 
0.9941 
0.9993 
1.0000 
0.0022 
0.0196 
0.0835 
0.2254 
0.4382 
0.6652 
0.8418 
0.9427 
0.9848 
0.9972 
0.9997 
1.0000 
0.0013 
0.0126 
0.0579 
0.1686 
0.3531 
0.5744 


0.7462 
0.9103 
0.9806 
0.9982 
0.0010 
0.0108 
0.0547 
0.1719 
0.3770 
0.6231 
0.8282 
0.9454 
0.9893 
0.9991 
0.0005 
0.0059 
0.0327 
0.1133 
0.2744 
0.5000 
0.7256 
0.8867 
0.9673 
0.9941 
0.9995 
0.0002 
0.0032 
0.0193 
0.0730 
0.1939 
0.3872 
0.6128 
0.8062 
0.9270 
0.9807 
0.9968 
0.9998 
0.0000 
0.0017 
0.0112 
0.0462 
0.1334 
0.2905 
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Table ST1 (Continued) 


0.01 


0.8687 
0.9916 
0.9997 
1.0000 


OeINMN PWN — © 


—_ 
_ 


15 0.8601 
0.9904 
0.9996 


1.0000 


0.05 


0.4877 
0.8470 
0.9700 
0.9958 
0.9996 
1.0000 


0.4633 
0.8291 
0.9638 
0.9946 
0.9994 
1.0000 


0.10 


0.20 


Pp 
0.25 


0.30 


0.9999 
1.0000 


0.2288 
0.5847 
0.8416 
0.9559 
0.9908 
0.9986 
0.9998 
1.0000 


0.2059 
0.5491 
0.8160 
0.9444 
0.9873 
0.9978 
0.9997 
1.0000 


0.9930 
0.9988 


0.9998 
1.0000 


0.0440 
0.1979 
0.4480 
0.6982 
0.8702 
0.9562 
0.9884 
0.9976 
0.9996 
1.0000 


0.0352 
0.1672 
0.3980 
0.6482 
0.8358 
0.9390 
0.9820 
0.9958 
0.9992 
0.9999 
1.0000 


0.9757 
0.9944 


0.9990 
0.9999 
1.0000 


0.0178 
0.1010 
0.2812 
0.5214 
0.7416 
0.8884 
0.9618 
0.9897 
0.9979 
0.9997 
1.0000 


0.0134 
0.0802 
0.2361 
0.4613 
0.6865 
0.8516 
0.9434 
0.9827 
0.9958 
0.9992 
0.9999 
1.0000 


0.9376 
0.9818 


0.9960 
0.9994 
0.9999 
1.0000 


0.0068 
0.0475 
0.1608 
0.3552 
0.5842 
0.7805 
0.9067 
0.9686 
0.9917 
0.9984 
0.9998 
1.0000 


0.0048 
0.0353 
0.1268 
0.2969 
0.5255 
0.7216 
0.8689 
0.9500 
0.9848 


0.9964 


0.9993 
0.9999 
1.0000 
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0.333 


0.8965 
0.9654 


0.9912 
0.9984 
0.9998 
1.0000 


0.0034 
0.0274 
0.1054 
0.2612 
0.4755 
0.6898 
0.8506 
0.9424 
0.9826 
0.9960 
0.9993 
0.9999 
1.0000 


0.0023 
0.0194 
0.0794 
0.2092 
0.4041 
0.6184 
0.7970 
0.9118 
0.9692 
0.9915 
0.9982 
0.9997 
1.0000 


0.40 


0.7712 
0.9024 


0.9679 
0.9922 
0.9987 
0.9999 
1.0000 
0.0008 
0.0081 
0.0398 
0.1243 
0.2793 
0.4859 
0.6925 
0.8499 
0.9417 
0.9825 
0.9961 
0.9994 
0.9999 


0.0005 
0.0052 
0.0271 
0.0905 
0.2173 
0.4032 
0.6098 
0.7869 
0.9050 
0.9662 
0.9907 
0.9981 
0.9997 
1.0000 


0.50 


0.5000 
0.7095 


0.8666 
0.9539 
0.9888 
0.9983 
0.9999 
0.0000 
0.0009 
0.0065 
0.0287 
0.0898 
0.2120 
0.3953 
0.6048 
0.7880 
0.9102 
0.9713 
0.9936 
0.9991 
0.9999 
0.0000 
0.0005 
0.0037 
0.0176 
0.0592 
0.1509 
0.3036 
0.5000 
0.6964 
0.8491 
0.9408 
0.9824 
0.9963 
0.9995 
1.0000 


Source: For n = 2 through 10, adapted with permission from E. Parzen, Modern Probability Theory and 
Its Applications, Wiley, New York, 1962. For n = 11 through 15, adapted with permission from Tables of 
Cumulative Binomial Probability Distribution, Harvard University Press, Cambridge, Mass., 1955. 
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Table ST2. Tail Probability Under Standard Normal Distribution’ 


z 


0.0 
0.1 
0.2 
0.3 
0.4 
0.5 
0.6 
0.7 
0.8 
0.9 
1.0 
1.1 
1.2 
13 
1.4 
1.5 
1.6 
1.7 
1.8 
1.9 
2.0 
2.1 
2.2 
2.3 
2.4 
2.5 
2.6 
2.7 
2.8 
2.9 
3.0 


0.00 


0.01 


0.02 


0.03 


0.04 


0.5000 
0.4602 
0.4207 
0.3821 
0.3446 
0.3085 
0.2743 
0.2420 
0.2119 
0.1841 
0.1587 
0.1357 
0.1151 
0.0968 
0.0808 
0.0668 
0.0548 
0.0446 
0.0359 


0.0287. 


0.0228 
0.0179 
0.0139 
0.0107 
0.0082 
0.0062 
0.0047 
0.0035 
0.0026 
0.0019 
0.0013 


0.4960 
0.4562 
0.4168 
0.3783 
0.3409 
0.3050 
0.2709 
0.2389 
0.2090 
0.1814 
0.1562 
0.1335 
0.1131 
0.0951 
0.0793 
0.0655 
0.0537 
0.0436 
0.0351 
0.0281 
0.0222 
0.0174 
0.0136 
0.0104 
0.0080 
0.0060 
0.0045 
0.0034 
0.0025 
0.0018 
0.0013 


0.4920 
0.4522 
0.4129 
0.3745 
0.3372 
0.3015 
0.2676 
0.2358 
0.2061 
0.1788 
0.1539 
0.1314 
0.1112 
0.0934 
0.0778 
0.0643 
0.0526 
0.0427 
0.0344 
0.0274 
0.0217 
0.0170 
0.0132 
0.0102 
0.0078 
0.0059 
0.0044 
0.0033 
0.0024 
0.0018 
0.0013 


0.4880 
0.4483 
0.4090 
0.3707 
0.3336 
0.2981 
0.2643 
0.2327 
0.2033 
0.1762 
0.1515 
0.1292 
0.1093 
0.0918 
0.0764 
0.0630 
0.0516 
0.0418 
0.0336 
0.0268 
0.0212 
0.0166 
0.0129 
0.0099 
0.0075 
0.0057 
0.0043 
0.0032 
0.0023 
0.0017 
0.0012 


0.4840 
0.4443 
0.4052 
0.3669 
0.3300 
0.2946 
0.2611 
0.2297 
0.2005 
0.1736 
0.1492 
0.1271 
0.1075 
0.0901 
0.0749 
0.0618 
0.0505 
0.0409 
0.0329 
0.0262 
0.0207 
0.0162 
0.0125 
0.0096 


0.0073 


0.0055 
0.0041 
0.0031 
0.0023 
0.0016 
0.0012 


0.05 


0.4801 
0.4404 
0.4013 
0.3632 
0.3264 
0.2912 
0.2578 
0.2266 
0.1977 
0.1711 
0.1469 
0.1251 
0.1056 
0.0885 
0.0735 
0.0606 
0.0495 
0.0401 
0.0322 
0.0256 
0.0202 
0.0158 
0.0122 
0.0094 
0.0017 
0.0054 
0.0040 
9.0030 
0.0022 
0.0016 
0.0011 


0.06 


0.4761 
0.4364 
0.3974 
0.3594 
0.3228 
0.2877 
0.2546 
0.2231 
0.1949 
0.1685 
0.1446 
0.1230 
0.1038 
0.0869 
0.0721 
0.0594 
0.0485 
0.0392 
0.0314 
0.0250 
0.0197 
0.0154 
0.0119 
0.0091 
0.0069 
0.0052 
0.0039 
0.0029 
0.0021 
0.0015 
0.0011 


0.07 


0.4721 
0.4325 
0.3936 
0.3557 
0.3192 
0.2843 
0.2514 
0.2206 
0.1922 
0.1660 
0.1423 
0.1210 
0.1020 
0.0853 
0.0708 
0.0582 
0.0475 
0.0384 
0.0307 
0.0244 
0.0192 
0.0150 
0.0116 
0.0089 
0.0068 
0.0051 
0.0038 
0.0028 
0.0021 
0.0015 
0.0011 


0.08 


0.4681 
0.4286 
0.3897 
0.3520 
0.3156 
0.2810 
0.2483 
0.2177 
0.1984 
0.1635 
0.1401 
0.1190 
0.1003 
0.0838 
0.0694 
0.0571 
0.0465 
0.0375, 
0.0301 
0.0239 
0.0188 
0.0146 
0.0113 
0.0087 


0.0066. 


0.0049 
0.0037 
0.0027 
0.0020 
0.0014 
0.0010 
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0.09 


0.4641 
0.4247 
0.3859 
0.3483 
0.3121 
0.2776 
0.2451 
0.2148 
0.1867 
0.1611 
0.1379 
0.1170 
0.0985 
0.0823 
0.0681 
0.0559 
0.0455 
0.0367 
0.0294 
0.0233 
0.0183 
0.0143 
0.0110 
0.0084 
0.0064 
0.0048 
0.0036 
0.0026 
0.0019 
0.0014 
0.0010 


Source: Adapted with permission from P. G. Hoel, Introduction to Mathematical Statistics, 4th ed., Wiley, 


New York, 1971, p. 391. 


*This table gives the probability that the standard normal variable Z will exceed a given positive value z, 
that is, P{Z > z_} = a. The probabilities for negative values of z are obtained by symmetry. 
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Table ST4. Student’s t-Distribution® 


a 

n 0.10 0.05 0.025 0.01 0.005 
1 3.078 6.314 12.706 31.821 63.657 
2 1.886 2.920 4.303 6.965 9.925 
3 1.638 2.353 3.182 4.541 5.841 
4 1.533 2.132 2.716 3.747 4.604 
5 1.476 2.015 2.571 3.365 4.032 
6 1.440 1.943 2.447 3.143 3.707 
7 1.415 1.895 2.365 2.998 3.499 
8 1.397 1.860 2.306 2 896 3.355 
9 1.383 1.833 2.262 2.821 3.250 
10 1.372 1.812 2.228 2.764 3.169 
11 1.363 1.796 2.201 2.718 3.106 
12 1.356 1.782 2.179 2.681 3.055 
13 1.350 1.771 2.160 2.650 3.012 
14 1.345 1.761 2.145 2.624 2.977 
15 1.341 1.753 2.131 2.602 2.947 
16 1.337 1.746 2.120 2.583 2.921 
17 1.333 1.740 2.110 2.567 2.898 
18 1.330 1.734 2.101 2.552 2.878 
19 1.328 1.729 2.093 2.539 2.86] 
20 1.325 1.725 2.086 2.528 2.845 
21 1.323 1.721 2.080 2.518 2.831 
22 1.321 1.717 2.074 2.508 2.819 
23 1.319 1.714 2.069 2.500 2.807 
24 1.318 1.711 2.064 2.492 2.797 
25 1.316 1.708 2.060 2.485 2.787 
26 1.315 1.706 2.056 2.479 2.779 
27 1.314 1.703 2.052 2.473 2.771 
28 1.313 1.701 2.048 2.467 2.763 
29 1.311 1.699 2.045 2.462 2.756 
30 1.310 1.697 2.042 2.457 2.750 
40 1.303 1.684 2.021 2.423 2.704 
60 1.296 1.671 2.000 2.390 2.660 
120 1.289 1.658 1.980 2.358 2.617 
one) 1.282 1.645 1.960 2.326 2.576 


Source: P. G. Hoel, Introduction to Mathematical Statistics, 4th ed., Wiley, New York, 1971, p. 393. 
Reprinted by permission of John Wiley & Sons, Inc. 

“The first column lists the number of degrees of freedom (n). The headings of the other columns give 
probabilities (a) for ¢ to exceed the entry value. Use symmetry for negative ¢ values. 
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684 STATISTICAL TABLES 
Table ST6. Random Normal Numbers, js = 0 and o = 1 
1 2 3 4 5 6 7 8 9 10 

0.464 0.137 2.455 —0.323 —0.068 0.290 —0.288 1.298 0.241 —0.957 
0.060 —2.526 —0.531 -—0.194 0.543 —1.558 0.187 —1.190 0.022 0.525 
1.486 —0.354 —0.634 0.697 0.926 1.375 0.785 -—0.963 -—0.853 —1.865 
1.022 -—0.472 1.279 3.521 0.571 ~—1.851 0.194 1.192 —0.501 —0.273 
1.394 -—0.555 0.046 0321 2.945 1.974 —0.258 0.412 0.439 —0.035 
0.906 -—-0.513 —0.525 0.595 0.881 -0.934 1.579 0.161 —1.885 0.371 
1.179 —1.055 0.007 0.769 0.971 0.712 1.090 —0.631 —0.255 —0.702 
—1.501 ~—0.488 —0.162 -—0.136 1.033 0.203 0448 0.748 —0.423 —0.432 
—0.690 0.756 —1.618 —0.345 -—0.511 -—2.051 -—0.457 —0.218 0.857 —0.465 
1.372 0.225 0.378 0.761 0.181 0.736 0.960 —1.530 —0.260 0.120 
—0.482 1.678 —0.057 —1.229 —0.486 0.856 -—0.491 —1.983 —2.830 —0.238 
—1.376 —0.150 1.356 —0.561 -0.256 —0.212 0.219 0.779 0.953 —0.869 
—1.010 0.598 -—0.918 1.598 0.065 0.415 -—0.169 0.313 —0.973 —1.016 
—0.005 —0.899 0.012 —0.725 1.147 —0.121 1.096 0481 -1.691 0.417 
1.393 1.163 —0.911 1.231 -—0.199 -0.246 1.239 ~—2.574 -0.558 0.056 
—1.787 --0.261 1.237 1.046 —0.508 —1.630 —0.146 -—0.392 —0.627 0.561 
—0.105 —0.357 —1.384 0.360 —0.992 -—0.116 -—1.698 —2.832 —1.108 —2.357 
—1.339 = =1.827 —0.959 0.424 0.969 —1.141 -—1.041 0.362 -—1.726 1.956 
1.041 0535 0.731 1.377 0.983 -—1.330 1.620 —1.040 0.524 —0.281 
0.279 —2.056 0.717 —0.873 —1.096 —1.396 1.047 0.089 —0.573 0.932 
1.805 —2.008 —1.633 0.542 0.250 —0.166 0.032 0.079 0.471 —1.029 
—-1.186 1.180 1.114 0882 1.265 —0.202 0.151 —0.376 —0.310 0.479 
0.658 —1.141 1151 -1.210 0.927 0.425 0.290 —0.902 0.610 2.709 
~0.439 0.358 —1.939 0.891 —0.227 0.602 0.873 —0.437 —0.220 —0.057 
—1.399 —0.230 0.385 —0.649 —0.577 0.237 -—0.289 0.513 0.738 —0.300 
0.199 0.208 —1.083 -—0.219 —0.291 1.221 1.119 0.004 —2.015 —0.594 
0.159 0.272 —0.313 0.084 —2.828 —0.430 —0.792 —1.275 —0.623 —1.047 
2.273 0.606 0.606 —0.747 0.247 1.291 0.063 -—1.793 -0.699 —1.347 
0.041 -—0.307 0.121 0.790 —0.584 0.541 0.484 —0.986 0.481 0.996 
—1.132 —2.098 0.921 0.145 0.446 —1.661 1.045 -—1.363 —0.586 —1.023 
0.768 0.079 —1.473 0.034 —2.127 0.665 0.084 —0.880 -—0.579 0.551 
0.375 —1.658 —0.851 0.234 —0.656 0.340 —0.086 —0.158 —0.120 0.418 
—0.513 —0.344 0.210 —0.736 1.041 0.008 0.427 —0.831 0.191 0.074 
0.292 —0.521 1.266 —1.206 —0.899 0.110 —0.528 -—0.813 0.071 0.524 
1.026 2.990 —0.574 —0.491 —1.114 1.297 —1.433 —1.345 -—3.001 0.479 
—1.334 1.278 —0.568 —0.109 —0.515 -0.566 2.923 0.500 0.359 0.326 
—0.287 —0.144 —0.254 0.574 —0.451 —1.181 —1.190 —0.318 —0.094 1.114 
0.161 —0.886 —0.921 —0.509 1.410 -0.518 0.192 —0.432 1.501 1.068 
—-1.346 0.193 —1.202 0.394 —1.045 0.843 0.942 1.045 0.031 0.772 
1.250 —0.199 —0.288 1.810 1.378 0.584 1.216 0.733 0.402 0.226 
0.630 —0.537 0.782 0.060 0.499 -0.431 1.705 1.164 0.884 —0.298 
0.375 —1.941 0.247 -0.491 0.665 -—0.135 -—0.145 -—0.498 0.457 1.064 
—1.420 0.489 —1.711 ~—1.186 0.754 -—0.732 —0.066 1.006 —0.798 0.162 
—0.151 —0.243 —0.430 —0.762 0.298 1.049 1.810 2.885 -~—0.768 —0.129 
—0.309 0.531 0416 —1.541 1.456 2.040 —0.124 0.196 0.023 —1.204 
0.424 —0.444 0.593 0.993 -0.106 0.116 0.484 ~—1.272 1.066 1.097 
0.593 0.658 —1.127 —1.407 —1.579 —1.616 1.458 1.262 0.736 —0.916 
0.862 —0.885 —0.142 —0.504 0.532 1.381 0.022 —0.281 —0.342 1.222 
0.235 —0.628 —0.023 —0.463 —0.899 -—0.394 -—0.538 1.707 0.188 —1.153 
—0.853.. 0.402 0.777 0.833 0.410 —0.349 —-1.094 0.580 1.395 1.298 


Source: From tables of the RAND Corporation, by permission. 
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Table ST7. Critical Values of the Kolmogorov—Smirnov One-Sample Test Statistic’ 


One-Sided Test: 


a= 010 0.05 0.025 0.01 0005 a= 0.10 
Two-Sided Test: 

a= 0.20 0.10 005 0.02 0.01 a= 0.20 

n=1 0.900 0.950 0.975 0.990 0.995 n=21 0.226 

2 0.684 0.776 0.842 0.900 0.929 22 0.221 

3 0.565 0.636 0.708 0.785 0.829 23 0.216 

4 0.493 0.565 0.624 0.689 0.734 24 0.212 

5 0.447 0.509 0.563 0.627 0.669 25 0.208 

6 0.410 0.468 0.519 0.577 0.617 26 0.204 

7 0.381 0.436 0.483 0.538 0.576 27 =0.200 

8 0.358 0.410 0.454 0.507 0.542 28 0.197 

9 0.339 0.387 0.430 0.480 0.513 29 0.193 

10 0.323 0.369 0.409 0.457 0.489 30 0.190 

11 0.308 0.352 0.391 0.437 0.468 31 0.187 

12 0.296 0.338 0.375 0.419 0.449 32 0.184 

13 0.285 0.325 0.361 0.404 0.432 33 0.182 

14 0.275 0.314 0.349 0.390 0.418 34 0.179 

15 0.266 0.304 0.338 0.377 0.404 35 0.177 

16 0.258 0.295 0.327 0.366 0.392 36 0.174 

17 0.250 0.286 0.318 0.355 0.381 37 0.172 

18 0.244 0.279 0.309 0.346 0.371 38 0.170 

19 0.237 0.271 0.301 0.337 0.361 39 0.168 

20 0.232 0.265 0.294 0.329 0.352 40 0.165 

Approximation 1.07 

forn > 40 


0.05 


0.10 


0.259 
0.253 
0.247 
0.242 
0.238 
0.233 
0.229 
0.225 
0.221 
0.218 
0.214 
0.211 
0.208 
0.205 
0.202 
0.199 
0.196 
0.194 
0.191 
0.189 
1.22 


0.025 


0.05 


0.287 
0.281 
0.275 
0.269 
0.264 
0.259 
0.254 
0.250 
0.246 
0.242 
0.238 
0.234 
0.231 
0.227 
0.224 
0.221 
0.218 
0.215 
0.213 
0.210 
1.36 


0.01 


0.02 


0.321 
0.314 
0.307 
0.301 
0.295 
0.290 
0.284 
0.279 
0.275 
0.270 
0.266 
0.262 
0.258 
0.254 
0.251 
0.247 
0.244 
0.241 
0.238 
0.235 
1.52 


0.005 


0.01 


0.344 
0.337 
0.330 
0.323 
0.317 
0.311 
0.305 
0.300 
0.295 
0.290 
0.285 
0.281 
0.277 
0.273 
0.269 
0.265 
0.262 
0.258 
0.255 
0.252 
1.63 


va vn vn nn 


Source: Adapted by permission from Table | of Leslie H. Miller, Table of percentage points of Kolmogrov 
statistics, J. Am. Stat. Assoc. 51 (1956), 111-121. 
“This table gives the values of Di, and Dy. for which a > P(D{ > Dt) anda > P{Dn > Dna} for 


some selected values of n and a. 
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Table ST8. Critical Values of the Kolmogorov—Smirnov Test Statistic for Two Samples 
of Equal Size* 


One-Sided Test: 
a= 0.10 0.05 0.025 0.01 0.005 a= 0.10 0.05 0.025 0.01 0.005 
Two-Sided Test: 
a= 0.20 0.10 0.05 0.02 0.01 a= 020 010 005 002 0.01 
n=3 2/3 2/3 n=20 6/20 7/20 8/20 9/20 10/20 
4 3/4 3/4 3/4 21 6/21) 7/21 8/21 9/21 10/21 
5 3/5 3/5 4/5 4/5 4/5 22 «7/22 «8/22 = 8/22,-—«:10/22 =10/22 
6 3/4 4/6 4/6 54 65/6 23 «-7/23)——-8/23) 9/23) «10/23 10/23 
7 477 #417) Sf 5/7 S/T 24 7/24 8/24 9/24 10/24 11/24 
8 4/8 4/8 58 5/8 «6/8 25 7/25 8/25) 9/25) «10/25 11/25 
9 49 59 59 6/9 86/9 26 «7/26 «68/26 «= 49/26 :10/26 11/26 
10 4/10 5/10 6/10 6/10 7/10 27) 7/27) 8/27) 9/27) 11/27): 11/27 
11 5/11 5/11) 6/110 7/11 7/11 28 «8/28 89/28 =10/28 11/28 12/28 
12 512 S5AI2 6/12 72 7/2 29° 8/29 9/29 10/29 11/29 12/29 
13. §/13 6/13 6/13) 7/13 8/13. 30 8/30 9/30 10/30 11/30 12/30 
14 5/14 6/14 7/14 7h4 8/14 31 8/31 9/31 10/31 11/31 12/31 
15 5/15 6/15 7/15 8/15 8/15 32 8/32 «9/32 10/32 12/32 12/32 
16 6/16 6/16 7/16 8/16 9/16 34 8/34 10/34 11/34 12/34 13/34 
17 6/17 7A7 7AT 8/7 9/7 36 «69/36 10/86 11/36 12/36 13/36 
18 6/18 7/18 8/18 9/18 9/18 38 9/38 10/38 11/38 13/38 14/38 
19 6/19 7/19 8/19 9/19 9/19 40 9/40 10/40 12/40 13/40 14/40 


Approximation 1.52 1.73 192 2.15 2.30 
for n > 40: Jn <n <n Sn sn 


Source: Adapted by permission from Tables 2 and 3 of Z. W. Birnbaum and R. A. Hall, Small sample 
distributions for multisample statistics of the Smirnov type, Ann. Math. Stat. 31 (1960), 710-720. 

“This table gives the values of D*, . and Dnin.« for which > P{DT, > Dye} anda > P{Dan > 
Dn,n,a} for some selected values of n and a. 


STATISTICAL TABLES 687 


Table ST9. Critical Values of the Kolmogorov—Smirnov Test Statistic for Two Samples 
of Unequal Size* 


One-Sided Test: a= 0.10 0.05 0.025 0.01 0.005 


Two-Sided Test: a= 0.20 0.10 0.05 0.02 0.01 
N, = 1 N,=9 17/18 
10 9/10 
Ni =2 N,=3 5/6 
4 3/4 
5 4/5 AIS 
6 5/6 5/6 
7 5/7 6/7 
8 3/4 18 718 
9 119 8/9 8/9 
10 7/10 4/5 9/10 
N, =3 N,=4 3/4 3/4 
5 2/3 4/5 4/5 
6 2/3 2/3 5/6 
7 2/3 3/7 6/7 6/7 
8 5/8 3/4 3/4 118 
9 2/3 2/3 79 8/9 8/9 
10 3/5 TAO 4/5 9/10 9/10 
12 TWA2 2/3 3/4 5/6 11/12 
Ni =4 Ny =5 3/5 3/4 4/5 4/5 
6 WA2 2/3 3/4 5/6 5/6 
7 17/28 5/7 3/4 6/7 6/7 
8 5/8 5/8 3/4 718 78 
9 5/9 2/3 3/4 1/9 89 
10 11/20 13/20 7/10 4/5 4/5 
12 TA2 2/3 2/3 3/4 5/6 
16 9/16 5/8 11/16 3/4 13/16 
N; =5 N2 =6 3/5 2/3 2/3 5/6 5/6 
7 4/7 23/35 5/7 29/35 6/7 
8 11/20 5/8 27/40 4/5 4/5 
9 5/9 3/5 31/45 19 4/5 
10 1/2 3/5 TAO 7/10 4/5 
15 8/15 3/5 2/3 11/15 11/15 


20 1/2 11/20 3/5 TAO 3/4 
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Table ST9 (Continued) 
One-Sided Test: a= 0.10 0.05 0.025 0.01 0.005 
Two-Sided Test: a= 0.20 0.10 0.05 0.02 0.01 
N, =6 Ny =7 23/42 4/7 29/42 5/7 5/6 
8 1/2 TAZ 2/3 3/4 3/4 
9 i/2 5/9 2/3 13/18 119 
10 1/2 17/30 19/30 TNO 1/15 
12 1/2 TAZ TA2 2/3 3/4 
18 4/9 5/9 LI/18 2/3 13/18 
24 = 11/24 1/2 TAZ 5/8 2/3 
Ni =7 N,=8 27/56 33/56 5/8 41/56 3/4 
9 31/63 5/9 40/63 5/7 47/63 
10 33/70 39/70 43/70 7/10 5/7 
14 3/7 4/2 4/7 9/14 5/7 
28 3/7 13/28 15/28 17/28 9/14 
N,=8 Ny=9 4/9 13/24 5/8 2/3 3/4 
10 19/40 21/40 23/40 27/40 7/10 
12 11/24 1/2 T2 5/8 2/3 
16 7/16 1/2 9/16 5/8 5/8 
32 13/32 TN6 1/2 9/16 19/32 
N, =9 N,=10 7/15 1/2 26/45 2/3 31/45 
12 4/9 1/2 5/9 11/18 2/3 
15 19/45 22/45 8/15 3/5 29/45 
18 7/18 419 1/2 5/9 11/18 
36 = 13/36 5/12 17/36 19/36 5/9 
N, = 10 N,=15 2/5 TWAS 1/2 17/30 19/30 
2025 9/20 1/2 11/20 3/5 
40 = 7/20 2/5 9/20 1/2 
N, = 12 Ny=15 23/60 9/20 1/2 11/20 TWA2 
16 83/8 7/16 23/48 13/24 TAZ 
18 13/36 5/12 17/36 19/36 5/9 
20 =11/30 5/12 TS 31/60 17/30 
N, = 15 N,=20 7/20 25 13/30 29/60 31/60 
N, = 16 N,=20 27/80 31/80 17/40 19/40 41/80 


- m+n m+n m+n m+n m+n 
Large sample. Mor] dan i304 5be es 
approximation mn mn mn mn mn 
Source: Adapted by permission from F. J. Massey, Distribution table for the deviation between two sample 
cumulatives, Ann. Math. Stat. 23 (1952), 435-441. 
“This table gives the values of Dt , , and Dna for which aa > P{Df, > Dt, q}anda > P{Dmn > 


mn,a 
Dm,n,o} for some selected values of N; = smaller sample size, N2 = larger sample size, and a. 
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Table ST10. Critical Values of the Wilcoxon Signed-Rank Test Statistic’ 


a 
n 0.01 0.025 0.05 0.10 
3 6 6 6 6 
4 10 10 10 9 
3 15 15 14 12 
6 21 20 18 17 
7 27 25 24 22 
8 34 32 30 27 
9 4] 39 36 34 
10 49 46 44 40 
1h 58 55 52 48 
12 67 64 60 56 
13 78 73 69 64 
14 89 84 79 73 
15 100 94 89 83 
16 112 106 100 93 
17 125 118 111 104 
18 138 130 123 115 
19 152 143 136 127 


20 166 157 149 140 


Source: Adapted by permission from Table 1 of R. L. McCormack, Extended tables of the Wilcoxon 
matched pairs signed-rank statistics, J. Am. Stat. Assoc. 60 (1965), 864-871. 

“This table gives values of ty for which P{T* > ty} < @ for selected values of n and a. Critical values 
in the lower tail may be obtained by symmetry from the equation ty = n(n + 1)/2 — ty. 
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Table ST11. Critical Values of the Mann-Whitney—Wilcoxon Test Statistic’ 


m a 2 3 4 5 6 7 8 9 10 
2 0.01 4 6 8 10 12 14 16 18 20 
0.025 4 6 8 10 12 14 15 17 19 

0.05 4 6 8 9 11 13 14 16 18 

0.10 4 5 7 8 10 12 13 15 16 

3 0.01 9 12 15 18 20 20 25 28 
0.025 9 12 14 16 19 21 24 26 

0.05 8 11 13 15 18 20 22 25 

0.10 7 10 12 14 16 18 21 23 

4 0.01 16 19 22 26 29 32 36 
0.025 15 18 21 24 27 31 34 

0.05 14 17 20 23 26 29 32 

0.10 12 15 18 21 24 26 29 

5 0.01 23 27 31 35 39 43 
0.025 22 26 29 33 37 41 

0.05 20 24 28 31 35 38 

0.10 19 22 26 29 32 36 

6 0.01 32 37 41 46 51 
0.025 30 35 39 43 48 

0.05 28 33 37 41 45 

0.10 26 30 34 38 42 

7 0.01 42 48 53. 58 
0.025 40 45 50 55 

0.05 37 42 47 52 

0.10 35 39 44 48 

8 0.01 54 60 66 
0.025 50 56 62 

0.05 48 53 59 

0.10 44 49 55 

9 0.01 66 73 
0.025 63 69 

0.05 59 65 

0.10 55 61 

10 0.01 80 
0.025 76 

0.05 72 

0.10 67 


Source: Adapted by permission from Table 1 of L. R. Verdooren, Extended tables of critical values for 
Wilcoxon’s test statistic, Biometrika 50 (1963), 177-186, with the kind permission of Professor E. S. 
Pearson, the author, and the Biometrika Trustees. 

®This table gives values of ug for which P{U > uy} < @ for some selected values of m,n, and a. Critical 
values in the lower tail may be obtained by symmetry from the equation wu) = mn — Ug. 
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Table ST12. Critical Points of Kendall’s Tau Test Statistic* 


a 
n 0.100 0.050 0.025 0.01 
3 3 3 3 3 
4 4 4 6 6 
5 6 6 8 8 
6 7 9 li il 
7 9 eI 13 15 
8 10 14 16 18 
9 12 16 18 22 
10 15 19 21 25 


Source: Adapted by permission from Table 1, p. 173, of M. G. Kendall, Rank Correlation Methods, 3rd 
ed., Charles Griffin, London, 1962. For values of n > tt, see W. J. Conover, Practical Nonparametric 
Statistics, Wiley, New York, 1971, p. 390. 

“This table gives the values of Sy for which P{S > Sq} <a, where S = (5)T, for some selected values 
of w and n. Values in the lower tail may be obtained by symmetry, Sig = —Sy. 


Table ST13. Critical Values of Spearman’s Rank Correlation Statistic* 


a 

n 0.01 0.025 0.05 0.10 
3 1.000 1.000 1.000 1.000 
4 1.000 1.000 0.800 0.800 
5 0.900 0.900 0.800 0.700 
6 0.886 0.829 0.771 0.600 
7 0.857 0.750 0.679 0.536 
8 0.810 0.714 0.619 0.500 
9 0.767 0.667 0.583 0.467 
10 0.721 0.636 0.552 0.442 


Source: Adapted by permission from Table 2, pp. 174-175, of M. G. Kendall, Rank Correlation Methods, 
3rd ed., Charles Griffin, London, 1962. For values of n > 11, see W. J. Conover, Practical Nonparametric 
Statistics, Wiley, New York, 1971, p. 391. 

“This table gives the values of Ry for which P{R > Ra} < a for some selected values of n and a. Critical 
values in the lower tail may be obtained by symmetry, Rj. = —Re. 
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Answers to Selected Problems 


Problems 1.3 

1. (a) Yes; (b) yes; (c) no. 2. (a) Yes; (b) no; (c) no. 

6. (a) 0.9; (b) 0.05; (c) 0.95. 7. 1/16. 8. 4+ 21n2 = 0.487. 
Problems 1.4 


6.1 = 7Ps/15 8. (ye => (7) / (ca): 


12. (a) 4/ (5): wee / G) ©) 2B (7) i &) . 
@) 13(3) 2(3)/(G)© Be - 94-4] / (2) 


© (toa? ~4-9«] / (F).@ 13(3) (S)#/ (4 


“OOOH EC OYGH/B) 


Problems 1.5 
3. a(pb)’ 2 (" : ) [pd — b)I. 4. p/(2— p). 
ul = +1 
5. G/N)! / S°G/N)" ~ — for large N. 6.n=4. 
2 p3 n+2 
10. r/(r +g). 11. (a) 1/4; (b) 1/3. 12. 0.08. 
13. (a) 173/480; (b) 108/173, 15/173. 14. 0.0872. 
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Problems 1.6 


1. 1/(2 — p); (1 — p)/(@2 — p). 4. p?(1 — p)?[3 ~ 7p — p)). 
12. For any two disjoint intervals 1,;, l, © (a,b), €(1\)€U2) = (b — a)€(1, N I), where 
£(1) = length of interval /. 


13. (a) p ee len, » (b) 22/45 
: in = n~2 2 n-2 2 : 
2(56) (3) +2(8)" *(%) +2(% (g) ne? 
(©) 12/36; 2 (ZY? (Z) (3) +2 8)" * (8) (4) +2(B)" (8) (S) forn =2,3,. 
Problems 2.2 
3. Yes; yes. 


4.9; (1,1, 1,12), 0, 11,2, ), (1, 1,2,1,D,02, 61,0, 2 41,1, Dk (6,6, 6, 6, 6)}; 
{(6, 6, 6, 6, 6), (6, 6, 6, 6, 5), (6, 6, 6, 5, 6), (6, 6, 5, 6, 6), (6, 5, 6, 6, 6), (5, 6, 6, 6, 6)}. 
5. Yes; (1/4, 1/2) U (3/4, 1). 


Problems 2.3 


ee 0 1 2 3 
"P(X =x)| 1/8 3/8 3/8 1/8 


F(x) =0, x <0, = 1/8, O< x <1; = 1/2, 1<x <2; =5/8, 2<x <3; 
=1,x>3. 
3. (a) Yes; (b) yes; (c) yes; yes. 


Problems 2.4 
1. — py —-(— pst Nea. 
; ———; 1/x?; (d) e7*. 
2. (b) ad +x) (c) 1/x*; (d) e 
3. Yes; Fo(x) = 0 x <0, = 1 —e-* — Oxe-™ forx > 0; P(X > 1)=1-— FI). 
4. Yes; Fis) = 0, <0,=1~( 5) efor > 0. 
6. F(x) = e*/2 for x <0,=1-—e%/2 forx > 0. 
8. (c), (d), and (f). 
9. Yes; (a) 1/2,0<x<1,1/4for2<x<4; (b)1/(2@), |x| < 4; 
(c)xe*,x>0; (d) (x —1)/4 for 1 <x <3, and P(X = 3) = 1/2; 


(e) Axe’, x>0. 
10. If S(x) = 1 — F(x) = P(X > x), then S’(x) = — f(x). 


Problems 2.5 


2. x41/x. 
4. 6U1 — exp(— —276)] /y— ye -@ arc COS y + e720+8 arc COs 1, ly] <1. 
6 exp{—6 arctan z}[(1 Ger —eF"}-1, z>0, 
6 exp{—z6 — arctan z}[(1 + 22)(1 — e")]"', z < 0. 
10. f\x\(y) = 2/3 for 0 < y < 1,= 1/3 for1 < y <2. 
12. (a) 0, y < 0; F(0) for —1 < y < 1, and I for y > 1; 
(b) = Oif y < —b, = F(—-b) if y = —b, = F(y) if -b< y <b,=1lify > b; 
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(c) = FQ) if y < —b, = F(—b) if —b < y < 0, = F(b) if0 < y <b, = F(y) 
ify >b. 


Problems 3.2 


3. EX =0_ if 2r < 2m — 1 is an odd integer, 
r m-r+h T(r+4 2 : 5 
~ if 2r < 2m — 1 is an even integer. 
Tones 


9. 3p = a(1 — v)/v where v = (1 — p)'*. 
10. Binomial: a3 = (¢ — p)/./npq, % = 3 + (1 — 6pq)/3npq 
Poisson: a3 = A7~'/?, ag =3 41/2. 


Problems 3.3 


1. (b) e* (es — 1)/1 — e*); ©) pl — gs)" 1/10 — gs) — gh"), 8 < 1/9. 
6. f(Os)/f (8), f(e')/F@). 


Problems 3.4 
; o a? x? 
3. For any o > 0 take P(X =x) = ao? (x= -=) = gee # 0. 
a1 K2 — [4 o*[K? = 17 
5. P| X? = ——.—__ ] = ——~ > = W(1 < iK 2; 
( erat) pig + K404 — 2K204 gee. 
2 39 Ba ~ 0" 
P(X°=K = 
( eS pty + K4o04* —2K2o4 
Problems 4.2 
1. No. 4. 1/6; 0. 7. Marginals negative binomial, so also conditionals. 


8. h(y|x) = $(c? + x) /(c? +x? + y?)*7. 

9. X ~ B(pi, po + ps); Y/(—x) ~ B(po, ps). 

10. X ~ G(a, 1/8), ¥ ~ G+ y, 1/B), X/y ~ Bla, y), ¥ -x ~ Gly, 1/6). 
14. P(X <7)=1—e77. 15. 1/24; 15/16. 17. 6. 


Problems 4.3 


3. No, yes, no. 
10. = 1 — a/(2b) ifa < b, = b/(2a) if a > b. 
11. A/A + pw), 1/2. 


Problems 4.4 


2. (b) fy (vu) = 1/Qu), |v] <u, u > 0. 

6. P(X =x,M=m)=x(l—x)"0—-CU—2)™ Jifx=m, =n?(t —2)™ 
ifx <m. P(M =m) = 2n(1—2)™ —x(2—2)1—2)™",m>0. 

7. fx(x) = Ake" / KI, ke <x <k +1, k =0,1,2,.... 

11. fy(u) = 3u?/(1 +u)*, u > 0. 
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2 
13. (a) Fy,y(u, v) = [i — exp (-$5) (4 = *) ifu > 0, |v] < 2/2, 


202 2 
= 1 —exp[1 — u*/(207)] if u > 0, v > 2/2, = 0 elsewhere; 
spl2-le-v2 
b ,v) = —-e* ——__. 
Pee Jam T(1/2)V2 
Problems 4.5 
pei 2e+l 2b+2 
EXSY! = ————._ +- -——-_—-—_.. 3. X,Y)=0;X,Y¥d dent. 
2 4 a+) 1 34+ C+D akan: om 


15. My,y(u, v) = C — 2v)-! exp{u?/(1 — 2v)} for v < 1/2; o(U, V) = 0; no. 
18. pz,w = (a7 — 9} 2) sin @ cos @/./var(Z) var(W). 


EU? 
21. If U has pdf f, then EX” = EU"/(m+ 1) form > 0; = 5 - 


nee cee 
> 8 var(U) + 2(EU)? 


Problems 4.6 


a-p b-p b-p a-U ‘ 
Lutolf (<*) —f (*=*) \/@ (*) —® (=*) ] where © is the standard 
normal df. 
2. (a) 21 + X). 3. E{X|y} = wi + pA Cy — pa). 4, E(var{Y|X}). 
6.4/9. 7.(a)1; (b) 1/4. 8.x*/(kK +1), 1/1 +k). 


Problems 4.7 


5. (a) (5) /Bs (b) s- 


jal 


Problems 5.2 


— N 
5. Frid =( je )/ (Me PO == (yn )A (py eo 2 Mt bean 


—m)! 
P(Y =M)= Ws) P(x, .-- iml¥ = y) = Renee 
i=1,...,j,x 4x; fori F j. 
9. P(Y) = x) = qp" + pq*,x > 1; PX =x) = pg? +q?p* "x2 1; 
P(Y, = x) = P(Y, = x) for n odd; = P(Y, = x) for n even. 


0O<x<y, 


Problems 5.3 
2. (a) P{ F(X) = an (4 )r d= py*} = (") p'(1— p)"*,x =0,1,... 47. 


lai 
ne(S wp Data): 


22. x/l¥| ~ C(1,0); (2/7) + 27)7',0<z <0. 
27. (a)t/e?; (c)=O0ift <0,=a/tift >; (d) (a/p)re"!. 
29. (b) 1/(2./m), 1/2. 
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Problems 5.4 


1. (a) wy = 4, po = 15/4, p = —3/4, (YN (6- 2x, 8); © 0.3191. 


4. BN (ap +b, cur + d, a?0?, c*0?, p). 6. tan? @ = EX?/EY?. 7.07 = 03. 


Problems 6.2 


1.No. 2. Yes. 
3.Y,>Y~ F(y)=0ify <0,=1-e° ify>0. 
4. F(y) =0ify <0,=1-—e” ify>0. 
9.C(1,0). 12. No. 
13. (a) exp(—x~*), x > 0; EX‘ = PU —k/a),k <a; 
(b) exp(—e™*), —co < x < 00; M(t) = TU —2),t <1; 
(c) exp{—(—x)*}, x < 0; EX* = (-1)'T(1+k/a),k > —a. 
20. (a) Yes, no; (b) yes, no. 


Problems 6.3 
3. Yes; A, = n(n + 1)u/2, B, = oJ/n(n + 1)(2n + 1)/6. 


5. (a) M,(t) > Oasn > ov, no; (b) M,,(t) diverges as n — 00; 
(c) yes; (d)yes; (e) M, > e”/*, no. 


Problems 6.4 
1.(a)No;  (b) no. 2. No. 3. Fora < 1/2. 7. (a) Yes; (b) no. 


Problems 6.5 


4. Degenerate at 8. 5. Degenerate at 0. 
6. For p > 0, N(O, ./p), and for p < 0, S,/n ate degenerate. 


Problems 6.6 


1.(b) No; (c) yes; (d) no. 
2.N (0,1). 3.N(0,07/B?). 4.163 8. 0.0926; 1.92. 


Problems 7.2 


1. PX =0) = P(X = 1) = 1/8, P(X = 1/3) = P(X = 2/3) = 3/8, 
P(S? = 0) = 1/4, P(S? = 1/3) = 3/4. 


1 1 15 2 25 3 35 4 45 #5 #55 6 
" p(%) | 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36. 


Problems 7.3 
1. {F(min(x, y)) — F(x) FQ)}/n. 
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6, E(S*)F =~ a(n — In +2)... (n+ 2k — 3), k > 1. 
9. (a) P(X =t) =e (md) /(tn)!, 1 = 0, 1/n,2/n,...5 (b) CCL, 0); 
(c) F(nm/2,2/n). 10. (b) 2/./an; 3 + 6/(an). 


= a2 — 
11.0, 1,0, E(X, — 0.5)4/(144n"). 12. var(S?) = 4 ( + a) > var(X). 
= 


Problems 7.4 


2. n(m + 8)/[m(n — 2)]; 2n?{(m 4+ 8)? + (n — 2)(m + 26)}/[m?(n 2)°(n — 4)]. 
n—-1 


| 
ft) n> 14 y= 18 ee n>2. 
ey at Ere 


11. 2m"? n"?2(n + mer)-(m4n)/2gim /B ( ), —-0 <z7< 00. 


mn 
D2 
Problems 7.5 
1. (a) AN(q2?, 4:70) for nw # 0, X /o? =F x71) for wp = 0, 02 = 07/n; 

(b) for u £0, 1/X ~ AN(/, 62/4); for w= 0, on/Xn —> 1/N'O, 1); 

(c) for 4 # 0, In[X| ~ AN(in |x|, 02/2); for p = 0, In(|X|/on) —> In NO, DI; 


(d) AN(e*, eG), 
2.c == 1/2and /X ~ AN(JA, 1/4). 


Problems 7.6 


2\* = = 
Lt(n—1). 2.t(mt+n—2). 3 (=) r(“* +4) /r(). 


Problems 7.7 


Toe ~(n/24+1) 
y+ 2 us eae: 


_ p2yj-1/2 
3, [22 (1 — p*)I [ + a7 | 


4.J/n-1T ~t(n- 1). 


Problems 8.3 
7. fa,(x)/fo,(x). 9.No. 10. No. 


11. (b) Xq;— (e) (X, 8); (g) (I Xi, Ia = x») (h) X (ay, Xz, --- » Xqny)- 
! 1 


Problems 8.4 


2 


2. (44) a" ae ea) : 
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3, $2 = %=18?, var(S?) = (21)? 25 < var(S?) = 2. 4. No. 5.No. 


6 ("72)/(")ossstsnt=Sixs w=(7)7(") if0<t<s; 
=2/(") if =s,and CA) ifs+i<t<n. 


Gea Cea li. (a) NX/n; (b) no. 


12.t= Dtx;,1—(1— 2)" ift >t, and 1 ifr <t. 

13. (a) With 1 = =x; a op ee oD) gt fon n,t>s;  (c)(l—1/n); 
(4) 1 -1/ny 1+ 

14. With t = x), [EY (t) — (¢ — I" WO — Dt" — (ff - "> 1. 


15. With t = 7" x;, () ef a-5". 


Problems 8.5 


1. (a), (c), (d) Yes; (b) no. 2. 0.64761/n?. 
3.07! sup{x?/[e™ — 1]}. 5. 20(1—0)/n. 


x#0 
Problems 8.6 
2. B = (n — 1)S?/(nX), & = ee 3. @=X,6? =(n— 1)S2/n. 
4.@= XX - X[K? — FY, XK? = XP/n B = - KK - KO)? - KY, 
5. i = In{X /(X2]'7}, 6 Paik ye X}/n. 
Problems 8.7 


1. (a) med(X;);_ (b) Xqy3_ (©) n/ DX; ) —n/ Nn — Xj). 
2.(a)X/n; (b) 6, = 1/2 if X < 1/2, = X if 1/2 < X < 3/4, =3/4 if ¥ > 3/4; 


%, ifX>0 A X — 
Qe: (? #X 20 TH Ne 
6 = -% ~/xX?+(4% 2, X2 = Yo X2/n; 


()é=— ss ta ifn), n3 > O; = any value in (0,1) ifn; = n3 = 0; 
no mle if n; = 0,n3 ¥ 0; no mle if n; 4 0,n3 = 0; 
()6=—-}+4V14+4x2, (O=x 

3. i = —~O""(m/n). 

4. (a) @= Xa, B= Yi(X; -&)/n;  (b) A= Pyp(X1 > 1) =e" a <1, 
=la>l; A=1ifé>1, = exp{(@ — 1)/B} if a < 1. 

5. 6 =1/X. 6. f& = Lin X;/n, 6? = "(In X; — fi)?/n. 

8. (a) N = ML X iy) — 1; (b) Xap. 

9. fi; = ry Xijjn=X,,i=1,2,...,5, 6? = BE(K — Xj)?/(ns). 

11. =X. 13.4) =(X/n)?. 15. fi = max(X, 0). 

16. pj = Xj/n, j =1,2,...,k-1. 
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Problems 8.8 


2.(a) (Sy +/+); (AY 3X 5X Yn. 


6. (X + 1)(X +n)/(@™ + 2)(n + 3)]. 8. (@ +n) max(a, Xqy)/(a +n — 1). 


Problems 8.9 


5. (c) (n + 2)[ (Ky /2))— FP — (XP + DUK /2)- — Kye I. 
10. (£X;)'T (n + k)/ Tn + 2k). 


Problems 9.2 


1.0.019, 0.857. 2k = Wo + azq//n, 1 — © (zy — MO J/n). 
5. exp(—2), exp(—2/0), 8 > 1. 


Problems 9.3 


1. 6(x) = lif x < (1 — V1 — @), = 0 otherwise. 

4. @(x) = 1if ||x] — 1] > k. 5. O(x) = Lif xq) > c = O — Infe'’”). 
11. If 9 < 0), O(x) = Lif xq) > Opa”, and if @ < , then @(x) = 1 
if x(1 < H(1 alr)! 


12. (x) = Lifx < J/a/2or> 1— Jfe/2. 


Problems 9.4 


1. (a), (b), (c), and (d) have MLR in © Xj; (e) and (f) in J]; X;- 
4. Yes. 5. Yes, yes. 


Problems 9.5 


1. O(x;, X2) = Lif |x; — x2| > c, = 0 otherwise, c = V 22/2: 
2. @(x) = Lif Ex; > k. Choose k from a = Py (Yj) Xi > k). 


Problems 9.6 


3. 6(x) = 1 if (no. of x;’s > 0 - no. of x;’s < 0) > k. 


Problems 10.2 


2. ¥ =# of x;, x insample, Y <c, or Y > cp. 3. X <c,or> cp. 
4, S? > c; Or < C2. 5. (a) Xin) > No; (b) Xa) > No or < c. 

6. |X — 0/2| >. T(ayX <cjor>a@; (b)X>c. 

11. Xi > A — In(a)!/". 12. Xa > Opa '/", 


Problems 10.3 


1. Reject ata = 0.05. 3. Do not reject Hp : pi = p2 = p3 = ps at 0.05 level. 
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4, Reject Hp at a = 0.05. 5. Reject at 0.10 but not at 0.05 level. 


7. Do not reject Hp at a = 0.05. 8. Do not reject Ho ata = 0.05. 
10.U = 15.41. 12, P-value = 0.5447. 
Problems 10.4 


1. t = —4.3, reject Hp ata = 0.02. 2. t = 1.64, do not reject Hp. 
5.t=5.05. 6. Reject Hp ata = 0.05. 7. Reject Hp. 8. Reject Ap. 


Problems 10.5 


1. Do not reject Hp : 0) = o2 at a = 0.10. 
3. Do not reject Ho at a = 0.05. 4. Do not reject Ho. 


Problems 10.6 


2. (a) &(x) = Lif Ux; = 5, = 0.12 if Dx; = 4, = 0 otherwise; 
(b) minimax rule rejects Ho if Xx; = 4 or 5, and with probability 1/16 if Xx; = 3; 
(c) Bayes rule rejects Ho if Lx; > 2. 

3. Reject Hp if ¥ < (1 — 1/n)1n2; 
BC) = P(Y < (n— 1)1n2), B(2) = P(Z < (n — 1) 1n2) where Y ~ G(n, 1), and 
Z ~ G(n, 1/2). 


Problems 11.3 


1.(77.7,84.7). 2.n=42. 7. (B% 2EKi (don): 


2n,aj2 
9. (2X/(2 — Ay), 2X/(2 — A2)), AZ — A = 4 — 2). 10. [a'/"N}. 


In(t /a) 
In > qataxap 


12. Choose k from a = (k + l)e~*. 13. X + z0/Jn. 

14. (2X? /c., UX? /c,) where pe x? (y)dy = | — a, and yx2(y)dy = n(1 — @). 
15. Posterior B(n + @, Bx; + B —-n). 

16. A(ulx) = /= exp{—3(u — ¥)}[@(/n(1 — x)) — &(—/n(1 + X))], where © 


is standard normal df. 


Problems 11.4 


L. (Xa) — X3q/(2n), Xa). 

2. (2nX /b, 2nX /a), choose a, b from fi x3, (udu = 1 — a, and a* x? (a) = b’ x2, (b), 
where x?2(x) is the pdf of x?(v) rv. 

3. (X/(1 — b), X/(1 — a)), choose a, b from 1 — a = b? — a? and a(1 — a)* = b(1 — b)?. 

4.n = [4z}_y./d?} + 1; 0 > (1/a)In(1/a). 


Problems 11.5 


1. (Xen, a7" X any). . 
2. (2UX;/2, 20.X;/2;) where A), Az are solutions of A} fona(A1) = Az fone (Az) and 
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P(1) =1—a, f, is x?(v) pdf. 
2 
3. (Xa) a “a. Xi). 5. (aX), Xqy)- 8. Yes. 


Problems 12.3 


n&(ti—1)?/ Et? 


‘ «¢ eo-a6| 
4, Reject Ho : ao = a7 Se reer wearer > c 
: ° E(¥j—Gq—4y1))? /(n—2) 


8. Normal equations By Dxk + 6, Dxk*! + By Sak? = DY xk, k =0, 1,2. 
Reject Ho : Bo = Oi (Ipal//c}/\/ B(% — Bo — Bix: — Bax?) > co where 


By = Eci¥; and fy = ¥ — Bix, B, = E(x; — ¥)(¥; — Y)/ Dui — ¥). 
10. (a) Bo = 0.28, 8B, = 0.411; (b) t = 4.41, reject Ao. 


Problems 12.4 


2. F = 10.8. 3. Reject at a = 0.05 but not at a = 0.01. 
4. BSS = 28.57, WSS = 26, reject at a = 0.05 but not at 0.01. 
5. F = 56.45. 6. F = 0.87. 


Problems 12.5 


4. SS methods = 50, SS ability = 64.56, ESS = 25.44; reject Hp at a = 0.05, not at 0.01. 
5. Frariety = 24.00. 


Problems 12.6 


bi 
. 2 amyni0;.- 9) 
2. Reject Ho if ————--——, > c 
ee SEE ip 
4, SS, (machines) = 2.786, d.f. = 3; SSI = 73.476, df. = 6; 
SS. (machines) = 27.054, d.f. = 2; SSE = 41.333, df. = 24. 


5. Cities 3. 227.27) 4.22 
Auto 3 3695.94 68.66 
Interactions 9 9.28 0.06 
Error 16 287.08 

Problems 13.2 


1. d is estimable of degree 1; (number of x;’s in A)/n. 
2. (a) (mn) EX)DY;;  (b) Sh + SZ. 
3. (a) 2X:¥;/n; (bv) U(X; + ¥; -X —Y/?/(a — 1). 


Problems 13.3 


3, Do not reject Hp. 7. Reject Ap. 10. Do not reject Ho at 0.05 level. 
11. T+ = 133, do not reject Ap. 
12. (2nd part) T* = 9, do not reject Hp at a = 0.05. 
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Problems 13.4 


1. Do not reject Apo. 2. (a) Reject; —(b) reject. 
3. U = 29, reject Ho. 5.d = 1, do not reject Ho. 
7.t = 313.5, z = 3.73, reject; r = 10 or 12, do not reject at a = 0.05. 


Problems 13.5 


1. Reject Hp at a = 0.05. 4. Do not reject Hp ata = 0.05. 
9.(a)t = 1.21; (b)r = 0.62;  (c) reject Ho in each case. 


Problems 13.6 
1.(a)5;  (b) 8. 3. p?*(n + p—np) <1. 


4.n > (z1-y<S poll — po) — 21-3/pi Cl — pi))*/(pr — po)’. 


Problems 13.7 


1. (c) E{n(X — p)*)/ES? = 1 +4 2p(1 — 2p/n)~'; ratio = 1if p = 0, > 1 for p > 0. 
2. Chi-square test based on (c) is not robust for departures from normality. 
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of sample quantile, 338 
of sample range, 176 
Distribution function, 44, 45, 103 
continuity points of a, 44, 51 
of a continuous type RV, 50 
convolution, 141 
decomposition of a, 55 
discontinuity points of a, 44 
of a discrete type RV, 49 
of a function of an RV, 57 
of an RV, 45 
of multiple RVs, 103 
Domain of attraction, 294 


Efficiency of an estimator, 402 
relative, 402 

Empirical DF = sample DF, 310 

Equal likelihood, | 

Equivalent RVs, 123 

Equivariant estimator, 356, 445 

Estimable function, 377, 599 

Estimate, 353 
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Estimable parameter, 599 
degree, 600, 605 
kernel, 600, 605 
Estimator, 353, 354 
equivariant, 356, 445 
Hodges—Lehmann, 657 
James-Stein, 451 
L-, 657 
least squares, 563 
M-, 657 
minimum risk equivariant, 446 
Pitman, 448-449, 451-452 
point, 354 
R-, 657 
Event, 3 
certain, 9 
elementary = simple, 3 
disjoint = mutually exclusive, 7, 35 
independent, 34 
null, 9 
Exchangeable random variables, 124, 156, 317 
Expectation, conditional, 165 
properties, 165 
Expected value = mean = mathematical 
expectation, 69 
of a function of RV, 141 
of product of RVs, 154 
of sum of RVs, 154 
Exponential distribution, 135, 215 
characterizations, 215~217 
memoryless property of, 216 
MGF, 215 
moments, 215 
Exponential family, 251 
k-parameter, 253 
natural parameters of, 254 
one-parameter, 251 
Extreme value distribution, 233 


Factorial moments, 86 
Finite mixture density function, 235 
Finite population correction, 318 
Fisher information, 393 
Fisher—Irwin test, 502 
Fisher’s Z-statistic, 333 
Fitting of distribution, binomial, 511 
Geometric, 511 
normal, 505 
Poisson, 506 
Fréchet, Cramér, and Rao inequality, 391 
Fréchet, Cramér, and Rao lower bound, 
391 
binomial, 393 
normal, 397 
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Fréchet, Cramér, and Rao inequality (cont.) 
one-parameter exponential family, 396 
Poisson, 394 

F-distribution: 

central, 330, 341 
moments of, 330 
noncentral, 332 
moments of, 332 
F-test(s), 518 
of general linear hypothesis, 566 
as generalized likelihood ratio test, 496, 566 
for testing equality of variances, 518 


Gamma distribution, 212 
bivariate, 117 
characterizations, 216 
MGF, 212 
moments, 212 
relation with Poisson, 218 
Gamma function, 211 
General linear hypothesis, 56! 
canonical form, 567 
estimation in, 562 
GLR test of, 566 
Generalized Likelihood ratio test, 491 
asymptotic distribution, 498 
F-test as, 496, 566 
for general linear hypothesis, 566 
for parameter of, binomial, 492 
for simple vs. simple hypothesis, 491 
bivariate normal, 499 
discrete uniform, 499 
exponential, 499 
normal, 499 
Generating functions, 85-86 
moment, 87 
probability, 86 
Geometric distribution, 86, 172, 187 
characterizations, 189, 204 
memoryless property of, 189 
MGE, 187 
moments, 187 
order statistics, 172 
PGF, 86 
Glivenko—Cantelli theorem, 311 
Goodness-of-fit problem, 504-505 


Hazard(= failure rate) function, 237 
Helmert orthogonal transformation, 342 
Hodges—Lehmann estimators, 657 
Hélder’s inequality, 159 
Hypergeometric distribution, 191 
bivariate, 117 
mean and variance, 191 
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Hypothesis, tests of, 454 
alternative, 455 
composite, 455 
null, 455 
parametric, 455 
simple, 455 


Identically distributed RVs, 123 
Implication rule, 12 
Inadmissible decision rule, 440 
Independence and correlation, 151 
Independence of events, 34 
complete = mutual, 35 
pairwise, 35 
Independence of RVs, 119, 123 
complete = mutual, 122 
pairwise, 122 
Independent, identically distributed RVs, 123 
sequence of RVs, 123 
Indicator function, 41 
Induced distribution, 61 
Infinitely often, 281 
Interections, 590 
Invariance, of hypothesis testing problem, 482 
principle, 484 
Invariant: 
decision problem, 443 
family of distributions, 442 
function, 445, 482 
location, 445 
location-scale, 445 
loss function, 443 
maximal, 482 
scale, 445 
statistic, 445, 482 
Invariant, class of distributions, 442 
maximal, 447, 482 
tests, 482 
UMP tests, 483 
Inverse Gaussian PDF, 238 


James-Stein estimator, 451! 
Joint: 
DF, 103, 105 
PDF, 107 
PMF, 106 
Jump, 48, 106 
Jump point, of a DF, 48, 106 


Kendall’s sample tau, 637 
distribution of, 637 
generating function, 95 

Kendali’s tau coefficient, 636 

Kendal]’s sample tau test, 637 
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Kernel, symmetric, 600, 605 
Kolmogorov’s, inequality, 284 
strong law of large numbers, 288 
Kolmogorov—Smirnov one sample statistic, 608 
for confidence bounds of DF, 613 
distribution, 609, 610-611 
Kolmogorov—Smimov test: 
comparison with chi-square test, 612 
one-sample, 611 
two-sample, 627 
Kolmogorov--Smirnov two sample statistic, 627 
distribution, 628 
Kronecker lemma, 285 
Kurtosis, 85 


L-, M-, and R-estimators, 657 
Laplace(= double exponential) distribution, 93, 
234 
MGEF, 94, 234 
Least square estimation, 563 
principle, 563 
restricted, 563 
Level of a test, 456 
L’ Hospital rule, 296 
Likelihood: 
equal, | 
equation, 410 
equivalent, 370 
function, 410 
Limit inferior, 12 
set, 12 
superior, 12 
Lindeberg central limit theorem, 298 
Lindeberg—Levy CLT, 296 
Lindeberg condition, 298 
Linear combinations of RVS, 154 
mean, 154 
variance, 155 
Linear dependence, 151 
Linear model, 562 
Linear regression model, 564, 569 
confidence intervals, 573 
estimation, 570 
problem, 564 
testing of hypotheses, 571-572 
Locally most powerful test, 487 
Location family, 204 
Location-scale family, 204 
Logistic distribution, 232 
Lognormal distribution, 91, 231 
Loss function, 355, 424 
Lower bound for variance, Chapman, 
Robbins, and Kiefer inequality, 397 
Fréchet, Cramér and Rao inequality, 391 
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Lyapunov condition, 300 
Lyapunov inequality, 99 


Maclaurin expansion of an MGF, 88, 94 
Mann-Whitney statistic, 629 
moments, 606-67 
null distribution, 630 
Mann-Whitney—Wilcoxon test, 629 
Marginal: 
DF, {10 
PDF, 109 
PMF, 109 
Markov’s inequality, 96 
Maximal invariant statistic, 447, 482 
function of, 483 
Maximum likelihood estimation, principle of, 
410 
Maximum likelihood estimator, 410 
asymptotic normality, 419-420 
consistency, 419-420 
as a function of sufficient statistic, 415 
invariance property, 418 
Maximum likelihood estimation method applied 
to: 
Bernoulli, 413 
binomial, 422 
bivariate normal, 423 
Cauchy, 422 
discrete uniform, 411 
exponential, 418 
gamma, 415, 418 
geometric, 422 
hypergeometric, 412 
normal, 411 
Poisson, 421 
uniform, 412, 416 
Mean square error, 150, 354, 380 
Median, 82, 84 
Median test, 625 
Memoryless property: 
of exponential, 216 
of geometric, 189 
Method: 
of CF or MGF, 141 
of DFs, 128 
of transformations, 132 
Methods of finding confidence interval: 
Bayes, 538 
for large samples, 540 
pivot, 533 
test inversion, 535 
Method of moments, 406-407 
applied to: 
beta, 409 
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binomial, 408 
gamma, 409 
lognormal, 409 
normal, 409 
Poisson, 407 
uniform, 407 
Minimal sufficient statistic, 371 
for beta, 376 
for gamma, 376 
for geometric, 376 
for normal, 372, 440 
for Poisson, 376 
for uniform, 372, 375 
Minimax, estimator, 425 
principle, 425 
solution, 521 
Minimax estimation, for parameter of Bernoulli, 
425 
binomial, 436 
hypergeometric, 438 
Minimum mean square error estimator, 387 
for variance of normal, 387 
Minimum risk equivariant estimator, 446 
for location parameter, 448-449 
for scale parameter, 451-452 
Mixing proportions, 234 
Minkowski inequality, 160 
Mixture density function, 234 
Moment: 
about origin, 72 
absolute, 72 
central, 79 
condition, 75 
of conditional distribution, 165: 
of DF, 72 
factorial, 80 
of functions of multiple RVs, 149 
inequalities, 95 
lemma, 76 
non-existence of order, 77 
of sample covariance, 319 
of sample mean, 315 
of sample variance, 316 
Moment generating function, 87 
continuity theorem for, 290 
differentiation, 88 
existence, 89 
expansion, 88 
limiting, 289 
of linear combinations, 145 
and moments, 90 
of multiple RVs, 142 
of sample mean, 320 
series expansion, 88 
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of sum of independent RVs, 145 
uniqueness, 88 
Moments, 69 
factorial, 86 
Monotone likelihood ratio, 472 
for hypergeometric, 475 
for one-parameter exponential family, 473 
UMP test for families with, 474 
for uniform, 473 
Most efficient estimator, asymptotically, 402 
as MLE, 417 
Most powerful test, 457 
for families with MLR, 474 
as a function of sufficient statistic, 466 
invariant, 483 
Neyman-—Pearson, 464 
similar, 480 
unbiased, 479 
uniformly, 457 
Multidimentional RV = multiple RV, 102 
continuous, 107 
discrete, 106 
Multinomial coefficient, 25 
Multinomial distribution, 198 
MGF, 198 
moments, 199 
Multiple decision problem, 524 
Bayes solution, 524 
Multiple RV, 102 
continuous type, 107 
discrete type, 106 
functions of, 127 
Multiplication rule, 29 
Multivariate hypergeometric distribution, 200 
Multivariate negative binomial distribution, 201 
Multivariate normal, 245 
dispersion matrix, 247 


Natural parameters, 254 
Negative binomial (= Pascal or waiting time) 
distribution, 185 
bivariate, 117 
central term, 202 
mean and variance, 186 
MGEF, 186 
Negative hypergeometric distribution, 193 
mean and variance, 194 
Neyman—Pearson lemma, 464 
Neyman—Pearson lemma applied to: 
Bernoulli, 468 
normal, 470 
Noncentral, chi-square distribution, 326 
F-distribution, 332 
t-distribution, 329 
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Noncentrality parameter, chi-square, 326 
F-distribution, 332 
t-distribution, 329 

Noninformative prior, 432 


Nonparametric = distribution-free estimation, 599 


methods, 598 
Nonparametric unbiased estimation, 599 
of population mean, 601 
of population variance, 601 
of tail probability, 601 
Normal approximation: 
to binomial, 303 
to Poisson, 303 
Normal distribution = Gaussian law, 90, 226 
bivariate, 138, 170, 238 
characteristic function, 90 
characterizations, 229 
contaminated, 651, 654 
folded, 452 
as limit of beta, 298 
as limit of binomial, 303 
as limit of geometric, 297 
as limit of Poisson, 291, 303 
MGE, 226 
moments, 227 
multivariate, 245 
singular, 242 
as stable distribution, 294 
standard, 115, 225 
tail probability, 228 
truncated, 115 
Normal equations, 563 


Odds, 8 
Order statistic, 171 
is complete and sufficient, 599 
joint PDF, 173 
joint marginal PDF, 175 
kth, 171 
marginal PDF, 174 
uses, 644 
moments, 177 
Ordered samples, 22 
Orders of magnitude, o and O notation, 
290 


Parameter(s), of a distribution, 69, 204, 598 
estimable, 599 
location, 204 
location-scale, 204 
order, 69, 80 
scale, 204 
shape, 204 
space, 354 
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Parametric statistical hypothesis, 456 
alternative, 455 
composite, 455 
null, 455 
problem of testing, 454 
simple, 455 
Parametric statistical inference, 306 
Pareto distribution, 84, 231 
Partition, 368 
coarser, 370 
finer, 370 
minimal sufficient, 370 
reduction of a, 370 
sets, 369 
sub-, 370 
sufficient, 369 
Permutation, 23 
Pitman estimator of: 
location, 448-449 
scale, 451-452 
Pitman’s asymptotic relative efficiency, 658 
Pivot, 533 
Point estimator, 354 
Poisson DF, as incomplete gamma, 218 
Poisson distribution, 58, 84, 194 
central term, 208 
characterizations, 195-196 
coefficient of skewness, 85 
kurtosis, 85 
as limit of binomial, 202 
as limit of negative binomial, 203 
mean and variance, 194 
MGEF, 88 
moments, 84 
PGF, 86 
truncated, 115 
Polya distribution, 192 
Pooled sample variance, 513 
Population, 306 
Population distribution, 307 
Posterior probability, 31 
Principle of: 
equivariance, 442, 445 
inclusion—exclusion, 10 
invariance, 484 
least squares, 563 
MLE, 410 
Prior probability, 31 
Probability, 7 
addition rule, 9 
axioms, 7 
conditional, 28 
continuity of, 14 
countable additivity of, 7 
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Probability (cont.) 
density function, 50 
distribution, 43 
equally likely assignment, 7 
on finite sample spaces, 21 
generating function, 86 
geometric, 14 
integral transformation, 208 
mass function, 48 
measure, 7 
monotone, 9 
multiplication rule, 29 
posterior and prior, 31 
principle of inclusion-exclusion, 10 
space, 2, 8 
subadditivity, 9 
subtractive, 9 
tail, 74 
total, 29 
uniform assignment of, 7 
Probability integral transformation, 208 
Problem: 
of location, 614 
of location and symmetry, 614 
of moments, 90 
P-value, 462 


Quadratic form, 238 
Quantile of order p = (100p)th percentile, 81 


Random, 14, 16 
Random experiment = statistical experiment, 3 
Random interval, 528 
coverage of, 528 
Random sample, 14, 23 
from a finite population, 23 
from a probability distribution, 14 
Random sampling, 307 
Random set, family of, 528 
Random variable(s), 41, 102 
bivariate, 106 
continuous type, 50, 107 
discrete type, 48, 106 
distribution of, 43 
degenerate, 49, 180 
equivalent, 123 
exchangeable, 124, 156, 317 
functions of a, 57 
multiple = multivariate, 102 
standardized, 80 
symmetric, 71 
symmetrized, 125 
truncated, 115 
uncorrelated, 151 
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Range, 176 
Rank correlation coefficient, 640 
Rayleigh distribution, 233 
Realization of a sample, 307 
Rectangular distribution, 207 
Regression: 
model, 564, 569 
coefficient, 346 
function, 574 
linear, 564, 569 
Regularity conditions of FCR inequality, 
391 
Risk function, 355, 424 
Robust estimator(s), 657 
Robust test(s), 660 
Robustness: 
of chi-square test, 657 
of sample mean as an estimator, 651 
of sample standard deviation as an 
estimator, 652 
of Student’s t-test, 655 
Robust procedure, defined, 650, 657 
Rules of counting, 22 
Run, 632 
Run test, 632 


Sample, 306, 307 
correlation coefficient, 313 
covariance, 312 
DF, 310 
mean, 308 
median, 313 
distribution of, 322 
MGF, 312 
moments, 311 
ordered, 22 
point, 3 
quantile of order p, 313, 338 
random, 307 
realization of, 307 
regression coefficient, 351 
space, 3, 307 
statistic, 307, 311 
standard deviation, 308 
variance, 308 
Sampling: 
from a finite population, 23, 308 
from an infinite population, 308 
simple random, 308 
Sample space, 3 
continuous, 3 
discrete, 3 
finite, 3 
uncountable, 3 
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Sampling with and without replacement, 22-23, noncentral, 329 
308 moments, 329 
Sampling from bivariate normal, 344 Student's t-statistic, 327 
distribution of sample correlation coefficient, Student’s t-test, 512, 513 
347 as generalized likelihood ratio test, 493 
independence of sample mean vector and for paired observations, 515 
dispersion matrix, 345 robustness of, 655 
Sampling from univariate normal, 339 Substitution principle, 406 
distribution of sample variance, 340 estimator, 406 
independence of X and S$”, 340 Sufficient statistic, 359, 599 
Scale family, 204 factorization criterion, 361 
Sequence of events, 12 joint, 362 
limit inferior, 12 Sufficient statistic for, Bernoulli, 362 
limit set, 12 beta, 374 
limit superior, 12 discrete uniform, 363 
nondecreasing, 12 gamma, 374 
nonincreasing, 12 lognormal, 375 
Set function, 7 normal, 363 
Shortest-length confidence interval(s), 546 Poisson, 360 
for the mean of normal, 547-548 uniform, 364 
for the parameter of exponential, 552 Support, of a DF, 51, 106 
for the parameter of uniform, 551 Survival function, 237 
for the variance of normal, 549 Symmetric DF or RV, 71 
Shrinkage estimator, 451 Symmetrization, 125 
o-field, 3 Symmetrized RV, 125 
choice of, 3 Symmetry, center of, 71 
generated by a class = smallest, 41 
Sign test, 614 Tail probabilities, 74 
Similar tests, 480 Test(s), w-similar, 480 
Single-sample problem(s), 608 chi-square, 500 
of fit, 608 critical = rejection region, 456 
of location, 614 critical function, 456 
and symmetry, 614 of hypothesis, 456 
Skewness, coefficient of, 85 F-, 518 
Slow variation, function of, 77 invariant, 482 
Slutsky’s theorem, 270 level of significance, 456 
Spearman’s rank correlation coefficient, locally most powerful, 487 
639 most powerful, 455 
distribution, 640 nonrandomized, 457 
Stable distribution, 225, 294 one-tailed, 513 
Standard deviation, 79 power function, 457 
Standard PDF, 204 randomized, 457 
Standardized RV, 80 similar, 480 
Statistic of order k, 171 size, 457 
marginal PDF, 174 Statistic, 458 
Stirling’s approximation, 202 Student’s 1, 512, 513 
Stochastic ordering, 625 two-tailed, 513 
Strong law of large numbers, 281 unbiased, 479 
Borel’s, 287 uniformly most powerful, 457 
Kolmogorov’s, 288 Testing the hypothesis of: 
Student’s t-distribution: equality of several normal means, 561 
central, 327 goodness-of-fit, 505, 608 
bivariate, 352 homogeneity, 507-508 


moments, 329 independence, 633, 635, 639 
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Tests of hypothesis: bivariate normal, 389 
Bayes, 523 discrete uniform, 388 
GLR, 491 exponential, 388 
minimax, 521 hypergeometric, 388 
Neyman—Pearson, 464 negative binomial, 387 
Tests of hypothesis listed: normal, 384 
chi-square tests, 500 Poisson, 382 
F-tests, 518 Unbiased test, 479 
t-tests, 512 for mean of normal, 481 
Tests of location, 614 and similar test, 480 
sign test, 614 UMP, 479 
Wilcoxon signed-rank, 617 Uncorrelated RVs, 151 
Tolerance coefficient, 644 Uniform distribution, 59, 73, 207 
Tolerance interval, 644 characterization, 209 
Total probability rule, 29 discrete, 182 
Transformation, 57, 128 generating samples, 208 
of continuous type, 60, 128 MGF, 207 
of discrete type, 58, 128 moments, 73, 207 
Helmert, 342 statistic of order k, 176, 221 
Jacobian of, 133 truncated, 115 
not one-to-one, 134 UMP test(s), 457, 479, 480, 483 
one-to-one, 58, 133 a-similar, 480 
Triangular distribution, 53 invariant, 483 
Trimmed mean, 657 unbiased, 479 
Trinomial distribution, 198 U-statistic, 600 
Truncated distribution, 114 for estimating mean and variance, 601 
Truncated RVs, 115 one sample, 600 
Truncation, 114 two sample, 605 
t-statistic, 327 
Two-point distribution, 180 Variance, 79 
Two-sample problems, 624 properties of, 79 
Types of error in testing hypotheses, 456 of sum of RVs, 155 


Variance stabilizing transformations, 336 
Unbiased confidence interval, 553, 555-556 


general method of construction, 553 Weak law of large numbers, 274, 275, 278 
for mean of normal, 554 centering and norming constants, 274 
for parameter of exponential, 559 Weibull! distribution, 233 
for parameter of uniform, 559 Welch approximate t-test, 514 
for variance of normal, 556 Wilcoxon score statistic, 629 
Unbiased estimator, 377 Wilcoxon signed-ranks test, 617 
best linear, 379 Wilcoxon statistic, 617 
and complete sufficient statistic, 383 distribution, 618-619, 622 
LMV, 379 generating function, 95 
and sufficient statistic, 382 moments, 622 
UMYV, 379 Winsorization, 116 


Unbiased estimation for parameter of: 
Bernoulli, 384, 387 
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